5.8 KiB
math.fma
Summary
Add fma
as a function to the math library, computing the Fused multiply–add operation, following the appropriate IEEE standard: IEEE 754-2008 (extension).
Motivation
Fused multiply-add, also known as multiply-accumulate and often abbreviated as fma
or mad
, is a computation which condenses the operations (a × b) + c
, which normally would require a MULSD
and ADDSD
instruction, into a single processor instruction.
This operation is commonly used in calculations such as dot products, cross products, quaternion rotations, and matrix multiplications.
The two advantages of this operation are as follows:
-
Floating point rounding only occurs at the end of the instruction, allowing for enhanced precision over computing two separate instructions.
-
Instruction count is reduced for math-intensive code, resulting in smaller code size and better performance.
The example below is a dot product between two 4-dimensional vectors:
function Vector4.Dot( Vector4 a, Vector4 b ): number return a.X * b.X + a.Y * b.Y + a.Z * b.Z + a.W * b.W; end
This computation results in a total of
7
math instructionsMULSD xmm0, xmm4 ; ax * bx MULSD xmm1, xmm5 ; ay * by MULSD xmm2, xmm6 ; az * bz MULSD xmm3, xmm7 ; aw * bw ADDSD xmm0, xmm1 ; (ax * bx) + (ay * by) ADDSD xmm0, xmm2 ; + (az * bz) ADDSD xmm0, xmm3 ; + (aw * bw)
Algorithm simplified with
fma
:function Vector4.Dot( Vector4 a, Vector4 b ): number return math.fma( a.X, b.X, math.fma( a.Y, b.Y, math.fma( a.Z, b.Z, a.W * b.W ) ) ); end
The optimization will reduce to
4
math instructions:MULSD xmm3, xmm7 ; aw * bw VFMADD213SD xmm2, xmm6, xmm3 ; az * bz + (aw * bw) VFMADD213SD xmm1, xmm5, xmm2 ; ay * by + (az * bz + aw * bw) VFMADD213SD xmm0, xmm4, xmm1 ; ax * bx + (ay * by + az * bz + aw * bw)
There is also the potential of updating libraries such as vector
to make use of math.fma
internally, and other cases such as Roblox's CFrame
could also benefit from this change.
Design
Introduction of a new function, math.fma
, which will perform the equivalent of the following operation:
function math.fma(a: number, b: number, c: number): number
return (a * b) + c;
end
The implementation of the function (internally) should make use the math.fma
operation from the <math.h>
library (<cmath>
in C++).
When generating native
code with --!native
or @native
, the operation should optimize the use of the FMA instruction set on supported devices, but fall back to MULSD
, ADDSD
, and SUBSD
instructions for unsupported devices.
Drawbacks
-
fma
instructions are not supported on all devices, so the benefits of usingmath.fma
may be limited to certain hardware and result in inconsistent performance across devices. -
Although it is possible to automatically convert instances of
(a × b) + c
andc + (a × b)
intofma
instructions, it is ill-advised.Automatic conversions will harm user code, such as
sqrt(x * x - y * y)
The compiler would attempt to optimize the interior of
sqrt(x * x - y * y)
into(x * x) - (y * y)
and eventuallyfma(x, x, -(y * y))
, which should compile to the following arithmetic instructions:MULSD xmm1, xmm1 ; y * y VFMSUB213SD xmm0, xmm0, xmm1 ; x * x - (y * y)
This is problematic, as there is a gap between the precision of the operations.
a * b
might produce a rounding error which is different froma * b - c
, and a negative value could unintentionally be introduced to the square root, even ifx == y
, resulting in an error which previously would not have existed.The original interpretation would be the following:
MULSD xmm0, xmm0 ; x * x MULSD xmm1, xmm1 ; y * y SUBSD xmm0, xmm1 ; (x * x) - (y * y)
There is no complication in this scenario, as both operations are identical and would result in the same rounding error.
Overall, it is better to allow the developer to manually decide which operations they choose to optimize, as the compiler will not understand the context, importance, or order of the operations and the rationale behind their arrangements.
As a consequence,
math.fma
must manually be invoked, and existing code would not benefit from any performance improvements, however, the integrity and continuity of the code will remain. -
The use of
math.fma
may result in confusing code if it is not cleanly implemented by the developer. -
Additional function added to the
math
library, which may only see use from more experienced developers that understand the micro-optimization of mathematical operations. -
Documentation for
math.fma
, particularly its benefits, pitfalls, and behavior are required to well explain when, why, and how to usemath.fma
. -
Complexity added to the compiler for native code generation.
Alternatives
- Not implementing
math.fma
, remaining with the lesser optimized method of manually computing(a * b)
followed bya + b
. - Implementing sister functions, such as
math.fms
and/ormath.fma231
, to make full use of the FMA instruction set, allowing for more delicate optimization of operations, and finer control over the order and structure of operations.