luau/CodeGen/include at f666594fb6784a7a4986d04e8fc6e834954c0b23 - luau-lang/luau

mirror of https://github.com/luau-lang/luau.git synced 2025-05-04 10:33:46 +01:00

History

Arseny Kapoulkine f666594fb6 CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants When the input is a constant, we use a fairly inefficient sequence of fmov+fcvt+dup or, when the double isn't encodable in fmov, adr+ldr+fcvt+dup. Instead, we can use the same lowering as X64 when the input is a constant, and load the vector from memory. However, if the constant is encodable via fmov, we can use a vector fmov instead (which is just one instruction and doesn't need constant space). Fortunately the bit encoding of fmov for 32-bit floating point numbers matches that of 64-bit: the decoding algorithm is a little different because it expands into a larger exponent, but the values are compatible, so if a double can be encoded into a scalar fmov with a given abcdefgh pattern, the same pattern should encode the same float; due to the very limited number of mantissa and exponent bits, all values that are encodable are also exact in both 32-bit and 64-bit floats. This strategy is ~same as what gcc uses. For complex vectors, we previously used 4 instructions and 8 bytes of constant storage, and now we use 2 instructions and 16 bytes of constant storage, so the memory footprint is the same; for simple vectors we just need 1 instruction (4 bytes). clang lowers vector constants a little differently, opting to synthesize a 64-bit integer using 4 instructions (mov/movk) and then move it to the vector register - this requires 5 instructions and 20 bytes, vs ours/gcc 2 instructions and 8+16=24 bytes. I tried a simpler version of this that would be more compact - synthesize a 32-bit integer constant with mov+movk, and move it to vector register via dup.4s - but this was a little slower on M2, so for now we prefer the slightly larger version as it's not a regression vs current implementation.	2024-03-12 11:10:40 -07:00
..
Luau	CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants	2024-03-12 11:10:40 -07:00
luacodegen.h	Sync to upstream/release/588 (#992 )	2023-07-28 08:13:53 -07:00

Arseny Kapoulkine f666594fb6 CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants

When the input is a constant, we use a fairly inefficient sequence of
fmov+fcvt+dup or, when the double isn't encodable in fmov, adr+ldr+fcvt+dup.

Instead, we can use the same lowering as X64 when the input is a constant, and
load the vector from memory. However, if the constant is encodable via fmov, we
can use a vector fmov instead (which is just one instruction and doesn't need
constant space).

Fortunately the bit encoding of fmov for 32-bit floating point numbers matches
that of 64-bit: the decoding algorithm is a little different because it expands
into a larger exponent, but the values are compatible, so if a double can be encoded
into a scalar fmov with a given abcdefgh pattern, the same pattern should encode the
same float; due to the very limited number of mantissa and exponent bits, all values
that are encodable are also exact in both 32-bit and 64-bit floats.

This strategy is ~same as what gcc uses. For complex vectors, we previously used 4
instructions and 8 bytes of constant storage, and now we use 2 instructions and 16
bytes of constant storage, so the memory footprint is the same; for simple vectors we
just need 1 instruction (4 bytes).

clang lowers vector constants a little differently, opting to synthesize a 64-bit integer
using 4 instructions (mov/movk) and then move it to the vector register - this requires
5 instructions and 20 bytes, vs ours/gcc 2 instructions and 8+16=24 bytes. I tried a
simpler version of this that would be more compact - synthesize a 32-bit integer constant
with mov+movk, and move it to vector register via dup.4s - but this was a little slower
on M2, so for now we prefer the slightly larger version as it's not a regression vs current
implementation.

2024-03-12 11:10:40 -07:00

Luau

CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants

2024-03-12 11:10:40 -07:00

luacodegen.h

Sync to upstream/release/588 (#992 )

2023-07-28 08:13:53 -07:00