mirror of
https://github.com/luau-lang/luau.git
synced 2025-01-22 02:38:06 +00:00
80928acb92
Instead of patching the tag component with TVECTOR in every instruction that produces a vector value, we now use a separate IR instruction to do this. This reduces implementation redundancy, but more importantly allows for a class of optimizations: - NUM_TO_VECTOR previously patched the component unconditionally but the result was used only in MUL/DIV_VEC instructions that ignore it anyway; we can now remove this. - ADD_VEC et al can now forward the source of TAG_VECTOR instruction of either input; this shortens the latency chain and in the future could allow us to generate optimal vector instruction sequence once the temporary stores are marked as dead. - In the future on X64, ADD_VEC et al will be able to analyze the input instruction and remove tag masking conditionally. This is not part of this PR as it requires a decision around expected FP environment and/or the necessity of the existing masking to begin with. I've also renamed NUM_TO_VECTOR to NUM_TO_VEC so that "VEC" always refers to "3 float values" and for consistency with ADD/etc. Note: ADD_VEC input forwarding is currently performed unconditionally; it may or may not increase the spills that can't be reloaded from the stack. On A64 this makes the Taylor series computation a tiny bit faster (11.3ns => 11.0ns) as it removes the redundant ins instructions along the NUM_TO_VEC path. Curiously, the optimization of forwarding TAG_VECTOR input to arithmetic instructions actually has a small penalty as without it this PR runs at 10.9 ns. I don't know if this is a property of the benchmark though, as I just noticed that in this benchmark type inference actually fails to infer parts of the computation as a vector op. If desired I will happily omit this part of the change and we can explore that separately. |
||
---|---|---|
.. | ||
AddressA64.h | ||
AssemblyBuilderA64.h | ||
AssemblyBuilderX64.h | ||
BytecodeAnalysis.h | ||
BytecodeSummary.h | ||
CodeAllocator.h | ||
CodeBlockUnwind.h | ||
CodeGen.h | ||
CodeGenCommon.h | ||
ConditionA64.h | ||
ConditionX64.h | ||
IrAnalysis.h | ||
IrBuilder.h | ||
IrCallWrapperX64.h | ||
IrData.h | ||
IrDump.h | ||
IrRegAllocX64.h | ||
IrUtils.h | ||
IrVisitUseDef.h | ||
Label.h | ||
OperandX64.h | ||
OptimizeConstProp.h | ||
OptimizeFinalX64.h | ||
RegisterA64.h | ||
RegisterX64.h | ||
UnwindBuilder.h | ||
UnwindBuilderDwarf2.h | ||
UnwindBuilderWin.h |