mirror of
https://github.com/luau-lang/luau.git
synced 2025-04-05 03:10:54 +01:00
Some checks are pending
benchmark / callgrind (map[branch:main name:luau-lang/benchmark-data], ubuntu-22.04) (push) Waiting to run
build / macos (push) Waiting to run
build / macos-arm (push) Waiting to run
build / ubuntu (push) Waiting to run
build / windows (Win32) (push) Waiting to run
build / windows (x64) (push) Waiting to run
build / coverage (push) Waiting to run
build / web (push) Waiting to run
release / macos (push) Waiting to run
release / ubuntu (push) Waiting to run
release / windows (push) Waiting to run
release / web (push) Waiting to run
To implement math.lerp without branches, we add SELECT_NUM which selects one of the two inputs based on the comparison condition. For simplicity, we only support C == D for now; this can be extended to a more generic version with a IrCondition operand E, but that requires more work on the SSE side (to flip the comparison for some conditions like Greater, and expose more generic vcmpsd). Note: On AArch64 this will effectively result in a change in floating point behavior between native code and non-native code: clang synthesizes fmadd (because floating point contraction is allowed by default, and the arch always has the instruction), whereas this change will use fmul+fadd. I am not sure if this is good or bad, and if this is a problem in C or not. Specifically, clang's behavior results in different results between X64 and AArch64 when *not* using codegen, and with this change the behavior when using codegen is... the same? :) Fixing this will require either using LERP_NUM instead and hand-coding lowering, or exposing some sort of "quasi" MADD_NUM (which would lower to fma on AArch64 and mul+add on X64). A small benefit to the current approach is `lerp(1, 5, t)` constant-folds the subtraction. With LERP_NUM this optimization will need to be implemented manually as a partial constant-folding for LERP_NUM. A similar problem exists today for vector.cross & vector.dot. So maybe this is not something we need to fix, unsure. |
||
---|---|---|
.. | ||
AssemblyBuilderA64.cpp | ||
AssemblyBuilderX64.cpp | ||
BitUtils.h | ||
BytecodeAnalysis.cpp | ||
BytecodeSummary.cpp | ||
ByteUtils.h | ||
CodeAllocator.cpp | ||
CodeBlockUnwind.cpp | ||
CodeGen.cpp | ||
CodeGenA64.cpp | ||
CodeGenA64.h | ||
CodeGenAssembly.cpp | ||
CodeGenContext.cpp | ||
CodeGenContext.h | ||
CodeGenLower.h | ||
CodeGenUtils.cpp | ||
CodeGenUtils.h | ||
CodeGenX64.cpp | ||
CodeGenX64.h | ||
EmitBuiltinsX64.cpp | ||
EmitBuiltinsX64.h | ||
EmitCommon.h | ||
EmitCommonA64.h | ||
EmitCommonX64.cpp | ||
EmitCommonX64.h | ||
EmitInstructionX64.cpp | ||
EmitInstructionX64.h | ||
IrAnalysis.cpp | ||
IrBuilder.cpp | ||
IrCallWrapperX64.cpp | ||
IrDump.cpp | ||
IrLoweringA64.cpp | ||
IrLoweringA64.h | ||
IrLoweringX64.cpp | ||
IrLoweringX64.h | ||
IrRegAllocA64.cpp | ||
IrRegAllocA64.h | ||
IrRegAllocX64.cpp | ||
IrTranslateBuiltins.cpp | ||
IrTranslateBuiltins.h | ||
IrTranslation.cpp | ||
IrTranslation.h | ||
IrUtils.cpp | ||
IrValueLocationTracking.cpp | ||
IrValueLocationTracking.h | ||
lcodegen.cpp | ||
NativeProtoExecData.cpp | ||
NativeState.cpp | ||
NativeState.h | ||
OptimizeConstProp.cpp | ||
OptimizeDeadStore.cpp | ||
OptimizeFinalX64.cpp | ||
SharedCodeAllocator.cpp | ||
UnwindBuilderDwarf2.cpp | ||
UnwindBuilderWin.cpp |