rfcs/docs/function-math-fma.md
Wunder Wulfe 4fd0d8508e
Update function-math-fma.md
Use Luau instead of C
2025-02-23 04:15:28 -03:00

5.7 KiB
Raw Blame History

math.fma

Summary

Add fma as a function to the math library, computing the Fused multiplyadd operation, following the appropriate IEEE standard: IEEE 754-2008 (extension).

Motivation

Fused multiply-add, also known as multiply-accumulate and often abbreviated as fma or mad, is a computation which condenses the operations (a × b) + c, which normally would require a MULSD and ADDSD instruction, into a single processor instruction.

This operation is commonly used in calculations such as dot products, cross products, quaternion rotations, and matrix multiplications.

The two advantages of this operation are as follows:

  1. Floating point rounding only occurs at the end of the instruction, allowing for enhanced precision over computing two separate instructions.

  2. Instruction count is reduced for math-intensive code, resulting in smaller code size and better performance.

    The example below is a dot product between two 4-dimensional vectors:

    function Vector4.Dot( Vector4 a, Vector4 b ): number
        return a.X * b.X + a.Y * b.Y + a.Z * b.Z + a.W * b.W;
    end
    

    This computation results in a total of 7 math instructions

    MULSD xmm0, xmm4             ; ax * bx
    MULSD xmm1, xmm5             ; ay * by
    MULSD xmm2, xmm6             ; az * bz
    MULSD xmm3, xmm7             ; aw * bw
    
    ADDSD xmm0, xmm1             ; (ax * bx) + (ay * by)
    ADDSD xmm0, xmm2             ; + (az * bz)
    ADDSD xmm0, xmm3             ; + (aw * bw)
    

    Algorithm simplified with fma:

     function Vector4.Dot( Vector4 a, Vector4 b ): number
         return math.fma( a.X, b.X, math.fma( a.Y, b.Y, math.fma( a.Z, b.Z, a.W * b.W ) ) );
     end
    

    The optimization will reduce to 4 math instructions:

    MULSD       xmm3, xmm7       ; aw * bw
    VFMADD213SD xmm2, xmm6, xmm3 ; az * bz + (aw * bw)
    VFMADD213SD xmm1, xmm5, xmm2 ; ay * by + (az * bz + aw * bw)
    VFMADD213SD xmm0, xmm4, xmm1 ; ax * bx + (ay * by + az * bz + aw * bw)
    

There is also the potential of updating libraries such as vector to make use of math.fma internally, and other cases such as Roblox's CFrame could also benefit from this change.

Design

Introduction of a new function, math.fma, which will perform the equivalent of the following operation:

function math.fma(a: number, b: number, c: number): number
  return (a * b) + c;
end

The implementation of the function (internally) should make use the math.fma operation from the <math.h> library (<cmath> in C++).

When generating native code with --!native or @native, the operation should optimize the use of the FMA instruction set on supported devices, but fall back to MULSD, ADDSD, and SUBSD instructions for unsupported devices.

Drawbacks

  • fma instructions are not supported on all devices, so the benefits of using math.fma may be limited to certain hardware and result in inconsistent performance across devices.

  • Although it is possible to automatically convert instances of (a × b) + c and c + (a × b) into fma instructions, it is ill-advised.

    Automatic conversions will harm user code, such as sqrt(x * x - y * y)

    The compiler would attempt to optimize the interior of sqrt(x * x - y * y) into (x * x) - (y * y) and eventually fma(x, x, -(y * y)), which should compile to the following arithmetic instructions:

    MULSD       xmm1, xmm1       ; y * y
    VFMSUB213SD xmm0, xmm0, xmm1 ; x * x - (y * y)
    

    This is problematic, as there is a gap between the precision of the operations. a * b might produce a rounding error which is different from a * b - c, and a negative value could unintentionally be introduced to the square root, even if x == y, resulting in an error which previously would not have existed.

    The original interpretation would be the following:

    MULSD xmm0, xmm0             ; x * x
    MULSD xmm1, xmm1             ; y * y
    
    SUBSD xmm0, xmm1             ; (x * x) - (y * y)
    

    There is no complication in this scenario, as both operations are identical and would result in the same rounding error.

    Overall, it is better to allow the developer to manually decide which operations they choose to optimize, as the compiler will not understand the context, importance, or order of the operations and the rationale behind their arrangements.

    As a consequence, math.fma must manually be invoked, and existing code would not benefit from any performance improvements.

  • The use of math.fma may result in confusing code if it is not cleanly implemented by the developer.

  • Additional function added to the math library, which may only see use from more experienced developers that understand the micro-optimization of mathematical operations.

  • Documentation for math.fma, particularly its benefits, pitfalls, and behavior are required to well explain when, why, and how to use math.fma.

  • Complexity added to the compiler for native code generation.

Alternatives

  • Not implementing math.fma, remaining with the lesser optimized method of manually computing (a * b) followed by a + b.
  • Implementing sister functions, such as math.fms and/or math.fma231, to make full use of the FMA instruction set, allowing for more delicate optimization of operations, and finer control over the order and structure of operations.