Introduce a new opcode, CLZ(x), which pops x from the stack and pushes the number of leading zero bits in x to the stack. If x is zero, pushes 256.
Motivation
Count leading zeros (CLZ) is a native opcode in many processor architectures (even in RISC architectures like ARM).
It is a basic building block used in math operations, byte operations, compression algorithms, data structures:
lnWad
powWad
lambertW0Wad
sqrt
cbrt
byte string comparisons
generalized calldata compression/decompression
bitmaps (for finding the next/previous set/unset bit)
post quantum signature schemes
Specification
A new opcode is introduced: CLZ (0x1e).
Pops 1 value from the stack.
Pushes a value to the stack, according to the following code:
/// @dev Count leading zeros.
/// Returns the number of zeros preceding the most significant one bit.
/// If `x` is zero, returns 256.
/// This is the fastest known `clz` implementation in Solidity and uses about 184 gas.
functionclz(uint256x)internalpurereturns(uint256r){/// @solidity memory-safe-assembly
assembly{r:=shl(7,lt(0xffffffffffffffffffffffffffffffff,x))r:=or(r,shl(6,lt(0xffffffffffffffff,shr(r,x))))r:=or(r,shl(5,lt(0xffffffff,shr(r,x))))r:=or(r,shl(4,lt(0xffff,shr(r,x))))r:=or(r,shl(3,lt(0xff,shr(r,x))))// forgefmt: disable-next-item
r:=add(xor(r,byte(and(0x1f,shr(shr(r,x),0x8421084210842108cc6318c6db6d54be)),0xf8f9f9faf9fdfafbf9fdfcfdfafbfcfef9fafdfafcfcfbfefafafcfbffffffff)),iszero(x))}}
Or in Python,
defclz(x):"""Returns the number of zeros preceding the most significant one bit."""ifx<0:raiseValueError("clz is undefined for negative numbers")ifx>2**256-1:raiseValueError("clz is undefined for numbers larger than 2**256 - 1")ifx==0:return256# Convert to binary string and remove any '0b' prefix.
bin_str=bin(x).replace('0b','')return256-len(bin_str)
The cost of the opcode is 3, the same as add.
Rationale
The special 0 case
256 is the smallest number after 255. Returning a small number allows the result to be compared with minimal additional bytecode.
For byte scanning operations, one can get the number of bytes to be skipped for a zero word by simply computing 256 >> 3, which gives 32.
Opcode namespace conservation
Computing the least significant bit can be easily implemented with CLZ by isolating the smallest bit via x & -x.
Gas cost
We have benchmarked the clz and add implementation in the intx library. clz uses approximately the same amount of compute cycles as add.