luau/tests/conformance
Arseny Kapoulkine 9aa82c6fb9
CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants (#1194)
When the input is a constant, we use a fairly inefficient sequence of
fmov+fcvt+dup or, when the double isn't encodable in fmov,
adr+ldr+fcvt+dup.

Instead, we can use the same lowering as X64 when the input is a
constant, and load the vector from memory. However, if the constant is
encodable via fmov, we can use a vector fmov instead (which is just one
instruction and doesn't need constant space).

Fortunately the bit encoding of fmov for 32-bit floating point numbers
matches that of 64-bit: the decoding algorithm is a little different
because it expands into a larger exponent, but the values are
compatible, so if a double can be encoded into a scalar fmov with a
given abcdefgh pattern, the same pattern should encode the same float;
due to the very limited number of mantissa and exponent bits, all values
that are encodable are also exact in both 32-bit and 64-bit floats.

This strategy is ~same as what gcc uses. For complex vectors, we
previously used 4 instructions and 8 bytes of constant storage, and now
we use 2 instructions and 16 bytes of constant storage, so the memory
footprint is the same; for simple vectors we just need 1 instruction (4
bytes).

clang lowers vector constants a little differently, opting to synthesize
a 64-bit integer using 4 instructions (mov/movk) and then move it to the
vector register - this requires 5 instructions and 20 bytes, vs ours/gcc
2 instructions and 8+16=24 bytes. I tried a simpler version of this that
would be more compact - synthesize a 32-bit integer constant with
mov+movk, and move it to vector register via dup.4s - but this was a
little slower on M2, so for now we prefer the slightly larger version as
it's not a regression vs current implementation.

On the vector approximation benchmark we get:

- Before this PR (flag=false): ~7.85 ns/op
- After this PR (flag=true): ~7.74 ns/op
- After this PR, with 0.125 instead of 0.123 in the benchmark code (to
use fmov): ~7.52 ns/op
- Not part of this PR, but the mov/dup strategy described above: ~8.00
ns/op
2024-03-13 12:56:11 -07:00
..
apicalls.lua Sync to upstream/release/571 (#895) 2023-04-07 14:01:29 -07:00
assert.lua Sync to upstream/release/501 (#20) 2021-11-01 14:52:34 -07:00
attrib.lua Sync to upstream/release/501 (#20) 2021-11-01 14:52:34 -07:00
basic.lua Sync to upstream/release/594 (#1036) 2023-09-07 17:13:49 -07:00
bitwise.lua Sync to upstream/release/602 (#1089) 2023-11-03 16:45:04 -07:00
buffers.lua Sync to upstream/release/604 2023-11-17 10:15:31 -08:00
calls.lua Sync to upstream/release/550 (#723) 2022-10-21 10:54:01 -07:00
clear.lua Sync to upstream/release/501 (#20) 2021-11-01 14:52:34 -07:00
closure.lua Sync to upstream/release/600 (#1076) 2023-10-20 18:10:30 -07:00
constructs.lua Sync to upstream/release/598 (#1063) 2023-10-06 12:02:32 -07:00
coroutine.lua Sync to upstream/release/514 (#372) 2022-02-17 17:18:01 -08:00
coverage.lua Sync to upstream/release/514 (#372) 2022-02-17 17:18:01 -08:00
datetime.lua Sync to upstream/release/598 (#1063) 2023-10-06 12:02:32 -07:00
debug.lua Sync to upstream/release/514 (#372) 2022-02-17 17:18:01 -08:00
debugger.lua Sync to upstream/release/576 (#928) 2023-05-12 10:50:47 -07:00
errors.lua Sync to upstream/release/550 (#723) 2022-10-21 10:54:01 -07:00
events.lua Sync to upstream/release/593 (#1024) 2023-09-01 10:58:27 -07:00
exceptions.lua Sync to upstream/release/501 (#20) 2021-11-01 14:52:34 -07:00
gc.lua Sync to upstream/release/542 (#649) 2022-08-25 14:53:50 -07:00
ifelseexpr.lua Sync to upstream/release/501 (#20) 2021-11-01 14:52:34 -07:00
interrupt.lua Sync to upstream/release/610 (#1154) 2024-01-26 19:20:56 -08:00
iter.lua Sync to upstream/release/550 (#723) 2022-10-21 10:54:01 -07:00
literals.lua Sync to upstream/release/501 (#20) 2021-11-01 14:52:34 -07:00
locals.lua Spelling (#119) 2021-11-04 09:50:46 -05:00
math.lua Sync to upstream/release/602 2023-11-03 12:47:28 -07:00
move.lua Sync to upstream/release/550 (#723) 2022-10-21 10:54:01 -07:00
native.lua Sync to upstream/release/616 (#1184) 2024-03-08 16:47:53 -08:00
native_types.lua Sync to upstream/release/601 (#1084) 2023-10-27 14:18:41 -07:00
ndebug_upvalues.lua Fix lua_*upvalue() when upvalue names aren't in debug info (#787) 2023-01-18 06:00:13 -08:00
pcall.lua Sync to upstream/release/600 (#1076) 2023-10-20 18:10:30 -07:00
pm.lua Spelling (#119) 2021-11-04 09:50:46 -05:00
safeenv.lua Sync to upstream/release/549 (#707) 2022-10-14 12:48:41 -07:00
sort.lua Sync to upstream/release/571 (#895) 2023-04-07 14:01:29 -07:00
strconv.lua Sync to upstream/release/608 (#1145) 2024-01-12 14:25:27 -08:00
stringinterp.lua String interpolation (#614) 2022-08-24 12:01:00 -07:00
strings.lua Sync to upstream/release/588 (#992) 2023-07-28 08:13:53 -07:00
tables.lua Sync to upstream/release/572 (#899) 2023-04-14 11:06:22 -07:00
tmerror.lua Sync to upstream/release/591 (#1012) 2023-08-18 11:15:41 -07:00
tpack.lua Sync to upstream/release/550 (#723) 2022-10-21 10:54:01 -07:00
types.lua Sync to upstream/release/544 (#669) 2022-09-08 15:14:25 -07:00
userdata.lua Sync to upstream/release/593 (#1024) 2023-09-01 10:58:27 -07:00
utf8.lua Sync to upstream/release/603 (#1097) 2023-11-10 13:10:07 -08:00
vararg.lua Sync to upstream/release/514 (#357) 2022-02-11 11:02:09 -08:00
vector.lua CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants (#1194) 2024-03-13 12:56:11 -07:00