luau/CodeGen/include
Arseny Kapoulkine f666594fb6 CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants
When the input is a constant, we use a fairly inefficient sequence of
fmov+fcvt+dup or, when the double isn't encodable in fmov, adr+ldr+fcvt+dup.

Instead, we can use the same lowering as X64 when the input is a constant, and
load the vector from memory. However, if the constant is encodable via fmov, we
can use a vector fmov instead (which is just one instruction and doesn't need
constant space).

Fortunately the bit encoding of fmov for 32-bit floating point numbers matches
that of 64-bit: the decoding algorithm is a little different because it expands
into a larger exponent, but the values are compatible, so if a double can be encoded
into a scalar fmov with a given abcdefgh pattern, the same pattern should encode the
same float; due to the very limited number of mantissa and exponent bits, all values
that are encodable are also exact in both 32-bit and 64-bit floats.

This strategy is ~same as what gcc uses. For complex vectors, we previously used 4
instructions and 8 bytes of constant storage, and now we use 2 instructions and 16
bytes of constant storage, so the memory footprint is the same; for simple vectors we
just need 1 instruction (4 bytes).

clang lowers vector constants a little differently, opting to synthesize a 64-bit integer
using 4 instructions (mov/movk) and then move it to the vector register - this requires
5 instructions and 20 bytes, vs ours/gcc 2 instructions and 8+16=24 bytes. I tried a
simpler version of this that would be more compact - synthesize a 32-bit integer constant
with mov+movk, and move it to vector register via dup.4s - but this was a little slower
on M2, so for now we prefer the slightly larger version as it's not a regression vs current
implementation.
2024-03-12 11:10:40 -07:00
..
Luau CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants 2024-03-12 11:10:40 -07:00
luacodegen.h Sync to upstream/release/588 (#992) 2023-07-28 08:13:53 -07:00