Alert Source Discuss
⚠️ Draft Standards Track: Core

EIP-7939: Count leading zeros (CLZ) opcode

Opcode to count the number of leading zero bits in a 256-bit word

Authors Vectorized (@Vectorized), Georgios Konstantopoulos (@gakonst), Jochem Brouwer (@jochem-brouwer), Ben Adams (@benaadams)
Created 2025-04-28
Discussion Link https://ethereum-magicians.org/t/create-a-new-opcode-for-counting-leading-zeros-clz/10805

Abstract

Introduce a new opcode, CLZ(x), which pops x from the stack and pushes the number of leading zero bits in x to the stack. If x is zero, pushes 256.

Motivation

Count leading zeros (CLZ) is a native opcode in many processor architectures (even in RISC architectures like ARM).

It is a basic building block used in math operations, byte operations, compression algorithms, data structures:

  • lnWad
  • powWad
  • lambertW0Wad
  • sqrt
  • cbrt
  • byte string comparisons
  • generalized calldata compression/decompression
  • bitmaps (for finding the next/previous set/unset bit)
  • post quantum signature schemes

Adding a clz opcode will also lead to cheaper ZK proving costs. The fastest known Solidity implementation uses several dynamic bitwise right shifts shr, which are very expensive to prove. In SP1 rv32im, a 32-bit right shift costs 1.6x more than a 32-bit mul.

Specification

A new opcode is introduced: CLZ (0x1e).

  • Pops 1 value from the stack.
  • Pushes a value to the stack, according to the following code:
/// @dev Count leading zeros.
/// Returns the number of zeros preceding the most significant one bit.
/// If `x` is zero, returns 256.
/// This is the fastest known `clz` implementation in Solidity and uses about 184 gas.
function clz(uint256 x) internal pure returns (uint256 r) {
    /// @solidity memory-safe-assembly
    assembly {
        r := shl(7, lt(0xffffffffffffffffffffffffffffffff, x))
        r := or(r, shl(6, lt(0xffffffffffffffff, shr(r, x))))
        r := or(r, shl(5, lt(0xffffffff, shr(r, x))))
        r := or(r, shl(4, lt(0xffff, shr(r, x))))
        r := or(r, shl(3, lt(0xff, shr(r, x))))
        // forgefmt: disable-next-item
        r := add(xor(r, byte(and(0x1f, shr(shr(r, x), 0x8421084210842108cc6318c6db6d54be)),
            0xf8f9f9faf9fdfafbf9fdfcfdfafbfcfef9fafdfafcfcfbfefafafcfbffffffff)), iszero(x))
    }
}

Or in Python,

def clz(x):
    """Returns the number of zeros preceding the most significant one bit."""
    if x < 0:
        raise ValueError("clz is undefined for negative numbers")
    if x > 2**256 - 1:
        raise ValueError("clz is undefined for numbers larger than 2**256 - 1")
    if x == 0:
        return 256
    # Convert to binary string and remove any '0b' prefix.
    bin_str = bin(x).replace('0b', '')
    return 256 - len(bin_str)

Or in C++,

inline uint32_t clz(uint32_t x) {
    uint32_t r = 0;
    if (!(x & 0xFFFF0000)) { r += 16; x <<= 16; }
    if (!(x & 0xFF000000)) { r += 8;  x <<= 8;  }
    if (!(x & 0xF0000000)) { r += 4;  x <<= 4;  }
    if (!(x & 0xC0000000)) { r += 2;  x <<= 2;  }
    if (!(x & 0x80000000)) { r += 1; }
    return r;
}

// `x` is a uint256 bit number represented with 8 uint32 limbs.
// This implementation is optimized for SP1 proving via rv32im.
// For regular compute, it performs similarly to `add`.
inline uint32_t clz(uint32_t x[8]) {
    if (x[7] != 0) return clz(x[7]);
    if (x[6] != 0) return 32 + clz(x[6]);
    if (x[5] != 0) return 64 + clz(x[5]);
    if (x[4] != 0) return 96 + clz(x[4]);
    if (x[3] != 0) return 128 + clz(x[3]);
    if (x[2] != 0) return 160 + clz(x[2]);
    if (x[1] != 0) return 192 + clz(x[1]);
    if (x[0] != 0) return 224 + clz(x[0]);
    return 256;
}

The cost of the opcode is 3, the same as add.

Rationale

The special 0 case

256 is the smallest number after 255. Returning a small number allows the result to be compared with minimal additional bytecode.

For byte scanning operations, one can get the number of bytes to be skipped for a zero word by simply computing 256 >> 3, which gives 32.

Opcode namespace conservation

Computing the least significant bit can be easily implemented with CLZ by isolating the smallest bit via x & -x.

Gas cost

We have benchmarked the clz implementation against the add implementation in the intx library. clz uses approximately the same amount of compute cycles as add.

The SP1 rv32im optimized variant uses less compute cycles than add, in the average and worst cases.

In SP1 rv32im, a 256-bit clz is cheaper to prove than add.

Backwards Compatibility

This is a new opcode not present prior.

Test Cases

PUSH32 0x000000000000000000000000000000000000000000000000000000000000000
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000100
PUSH32 0x8000000000000000000000000000000000000000000000000000000000000000
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000000
PUSH32 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000000
PUSH32 0x4000000000000000000000000000000000000000000000000000000000000000
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000001
PUSH32 0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000001
PUSH32 0x0000000000000000000000000000000000000000000000000000000000000001
CLZ
---
0x00000000000000000000000000000000000000000000000000000000000000ff

Security Considerations

clz is a stateless opcode that has a low worst-case constant cost in memory usage, compute and proving costs. It is therefore safe from being exploited for denial of service attacks.

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

Vectorized (@Vectorized), Georgios Konstantopoulos (@gakonst), Jochem Brouwer (@jochem-brouwer), Ben Adams (@benaadams), "EIP-7939: Count leading zeros (CLZ) opcode [DRAFT]," Ethereum Improvement Proposals, no. 7939, April 2025. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-7939.