2.5 KiB
bit32.clz/ctz
Summary
Add bit32.clz (count leading zeroes) and bit32.ctz (count trailing zeroes) to accelerate bit scanning
Motivation
All CPUs have instructions to determine the position of first/last set bit in an integer. These instructions have a variety of uses, the popular ones being:
- Fast implementation of integer logarithm (essentially allowing to compute
floor(log2(value))
quickly) - Scanning set bits in an integer, which allows efficient traversal of compact representation of bitmaps
- Allocating bits out of a bitmap quickly
Today it's possible to approximate clz
using floor
and log
but this approximation is relatively slow; approximating ctz
is difficult without iterating through each bit.
Design
bit32
library will gain two new functions, clz
and ctz
:
function bit32.clz(n: number): number
function bit32.ctz(n: number): number
clz
takes an integer number (converting the input number to a 32-bit unsigned integer as all other bit32
functions do), and returns the number of leading zero bits - that is,
the number of most significant zero bits in a 32-bit number until the first 1. The result is in [0, 32]
range.
For example, when the input number is 0
, it's 32
. When the input number is 2^k
, the result is 31-k
.
ctz
takes an integer number (converting the input number to a 32-bit unsigned integer as all other bit32
functions do), and returns the number of trailing zero bits - that is,
the number of least significant zero bits in a 32-bit number until the first 1. The result is in [0, 32]
range.
For example, when the input number is 0
, it's 32
. When the input number is 2^k
, the result is k
.
Non-normative: a proof of concept implementation shows that a polyfill for
clz
takes ~34 ns per loop iteration when computingclz
for an increasing number sequence, whereas a builtin implementation takes ~4 ns.
Drawbacks
None known.
Alternatives
These functions can be alternatively specified as "find the position of the most/least significant bit set" (e.g. "ffs"/"fls" for "find first set"/"find last set"). This formulation
can be more immediately useful since the bit position is usually more important than the number of bits. However, the bit position is undefined when the input number is zero,
returning a sentinel such as -1 seems non-idiomatic, and returning nil
seems awkward for calling code. Counting functions don't have this problem.
Of the two functions, clz
is vastly more useful than ctz
; we could implement just clz
, but having both is nice for symmetry.