Alert Source Discuss
⚠️ Draft Standards Track: Core

EIP-7885: Precompile for NTT operations

Proposal to add a precompiled contract that performs number theoretical transformation (NTT) and inverse (InvNTT).

Authors Renaud Dubois (@rdubois-crypto), Simon Masson (@simonmasson), Yoon Hyoung Lee (@yhl125)
Created 2025-02-12
Discussion Link https://ethereum-magicians.org/t/eip-7885-precompile-for-ntt-operations/22895

Abstract

This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to efficiently perform fast polynomial multiplication for post-quantum and STARK cryptography.

Motivation

With the recent advances in quantum computing, there are increased concerns for the quantum threat against Ethereum. Today ECDSA is the EOA signature algorithms, which is vulnerable to attacks by quantum computers. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and post-quantum threat mitigation.

Specification

Constants

Name Value Comment
NTT_FW TBD precompile address
NTT_INV TBD precompile address
NTT_VECMULMOD TBD precompile address
NTT_VECADDMOD TBD precompile address

We introduce four separate precompiles to perform the following operations:

  • NTT_FW - to perform the forward NTT transformation (Negative wrap convolution) with a gas cost of 600 gas,
  • NTT_INV - to perform the inverse NTT transformation (Negative wrap convolution) with a gas cost of 600 gas,
  • NTT_VECMULMOD - to perform vectorized modular multiplication with a gas cost formula defined in the corresponding section,
  • NTT_VECADDMOD - to perform vectorized modular addition with a gas cost formula defined in the corresponding section.

Field parameters

The NTT_FW and NTT_INV are fully defined by the following set of parameters: Let $R$ be a cyclotomic ring of the form $R=\mathbb F_q[X]/(X^n+1)$. In these notations,

  • $n$ is the degree and is a power of 2,
  • $\mathbb F_q$ is the prime field where $q \equiv 1 \mod 2n$,
  • $\omega$ is a $n$-th root of unity in $\mathbb F_q$,
  • $\psi$ is a $2n$-th root of unity in $\mathbb F_q$.

Any element $a \in R$ is a polynomial of degree at most $n-1$ with integer coefficients, written as $a=\sum_{i=0}^{n-1} a_iX^i$

NTT_FW

The NTT transformation is described by the following algorithm.

Input: A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in standard order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi_\text{rev} \in \mathbb{Z}_q^n$ storing powers of $\psi$ in bit-reversed order.

Output: $a \leftarrow \text{NTT_FW}(a)$ in bit-reversed order.

t ← n
for m = 1 to n-1 by 2m do
    t ← t / 2
    for i = 0 to m-1 do
        j1 ← 2 ⋅ i ⋅ t
        j2 ← j1 + t - 1
        S ← Ψrev[m + i]
        for j = j1 to j2 do
            U ← a[j]
            V ← a[j + t] ⋅ S
            a[j] ← (U + V) mod q
            a[j + t] ← (U - V) mod q
        end for
    end for
end for
return a

NTT_INV

The Inverse NTT is described by the following algorithm.

Input: A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in bit-reversed order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi^{-1}_\text{rev} \in \mathbb F_q^n$ storing powers of $\psi^{-1}$ in bit-reversed order.

Output: $a \leftarrow \text{NTT_INV}(a)$ in standard order.

t ← 1
for m = n to 1 by m/2 do
    j1 ← 0
    h ← m / 2
    for i = 0 to h-1 do
        j2 ← j1 + t - 1
        S ← Ψ⁻¹rev[h + i]
        for j = j1 to j2 do
            U ← a[j]
            V ← a[j + t]
            a[j] ← (U + V) mod q
            a[j + t] ← (U - V) ⋅ S mod q
        end for
        j1 ← j1 + 2t
    end for
    t ← 2t
end for
for j = 0 to n-1 do
    a[j] ← a[j] ⋅ n⁻¹ mod q
end for
return a

NTT_VECMULMOD

The NTT_VECMULMOD is functions similarly to SIMD, but operates with larger input and output sizes.

Input: Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.

Output: The element-wise product $(a[0]\cdot b[0] \mod q, a[1]\cdot b[1]\mod q, \dots, a[n-1]\cdot b[n-1] \mod q)$.

Gas cost: Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$.

NTT_VECADDMOD

The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output.

Input: Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.

Output: The element-wise addition $(a[0]+ b[0] \mod q, a[1]+ b[1]\mod q, \dots, a[n-1]+ b[n-1] \mod q)$.

Gas cost: Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) /32$.

Rationale

If $f$ and $g$ are two polynomials of $R$, then $f\times g= \text{NTT_INV}(\text{NTT_VECMULMOD}( \text{NTT_FW}(a), \text{NTT_FW}(b)))$ is equal to the product of $f$ and $g$ in $R$. The algorithm has a complexity of $n \log_2n$ rather than $n^2$ with the classical schoolbook multiplication algorithm.

Gas Cost Analysis

The gas cost model for EIP-7885 precompiles is designed to target approximately 50-55 mgas/s performance, consistent with existing precompiled contracts like ECRECOVER. Two implementations with different gas costs are available:

NTT Precompiles (0x12, 0x13) - Scheme-Specific Cost Model

Gas Costs by Implementation:

Scheme Ring Degree nocgo NTT_FW nocgo NTT_INV cgo NTT_FW cgo NTT_INV
Falcon-512 512 790 gas 790 gas 500 gas 500 gas
Falcon-1024 1024 1,750 gas 1,750 gas 1,080 gas 1,080 gas
ML-DSA (Dilithium) 256 220 gas 270 gas 256 gas 340 gas

Pure Go (nocgo): Zero-dependency implementation.

liboqs (cgo): Offers 36-38% lower gas costs for NTT operations.

Rationale: Gas costs are calibrated per cryptographic scheme based on actual execution performance. NTT computation complexity varies by ring degree and modulus size, with costs optimized to maintain consistent throughput across different post-quantum schemes.

Vector Operations (0x14, 0x15) - Dynamic Cost Model

VECMULMOD Gas Cost: ceil(0.32 × N)

VECADDMOD Gas Cost: ceil(0.3 × N)

Cost Components:

  • Linear scaling with vector length N (ring degree)
  • VECMULMOD: 0.32 gas per element
  • VECADDMOD: 0.3 gas per element

Benchmark Validation: The dynamic cost model achieves target performance:

  • Falcon-512 VECMULMOD (164 gas): 56.17 mgas/s (~2.9μs)
  • Falcon-512 VECADDMOD (154 gas): 54.45 mgas/s (~2.8μs)
  • Falcon-1024 VECMULMOD (328 gas): 54.67 mgas/s (~6.0μs)
  • Falcon-1024 VECADDMOD (308 gas): 52.84 mgas/s (~5.8μs)
  • ML-DSA VECMULMOD (82 gas): 41.89 mgas/s (~2.0μs)
  • ML-DSA VECADDMOD (77 gas): 46.81 mgas/s (~1.6μs)

The gas cost formulas accurately reflect actual execution costs while maintaining ~50-55 mgas/s performance target across all tested cryptographic standards.

Fields of interest

The implementation applies for many fields of interest for cryptography. In particular, the design applies for:

  • FALCON: $q=3 \cdot 2^{12}+1$ (one of the NIST winners for post-quantum signature scheme),
  • DILITHIUM: $q=2^{23}-2^{13}+1$ (one of the NIST winners for post-quantum signature scheme),
  • KYBER: $q=13 \cdot 2^8+1$ (one of the NIST winners for post-quantum key encapsulation mechanism),
  • Babybear: $q=15 \cdot 2^{27}+1$ (Risc0),
  • Goldilocks: $q=2^{64}-2^{32}+1$ (Polygon’s Plonky2),
  • M31: $q=2^{31}-1$ (Circle STARKS, STwo, Plonky3),
  • StarkCurve: $q=2^{251}+17 \cdot 2^{192}+1$

Benchmarks

Pure solidity

To illustrate the interest of the precompile, the assets provide the measured gas const for a single NTT and extrapolates the minimal gas cost taking into account the required number of NTT_FW and NTT_INV. The provided assets use pure Yul optimizations, with memory access hacks. It is unlikely that more than one order of magnitude could be spared on such a minimal code.

Use case Parameters single NTT gas cost Required NTT(FW/INV) Estimated NTT/Full cost
Falcon $q=12289, n=512$ 1.8 M 1 NTTFW+1 NTTINV 3.6 M
Dilithium $q=2^{23}-2^{13}+1, n=256$ 460K 4 NTTFW +1 NTTINV 2.3M

Falcon cost has been measured over a full implementation and is compliant to the estimation. Dilithium cost is evaluated assuming

This demonstrates that using pure solidity enables L2s with low gas fees to experiment with FALCON in the short term, whereas it is too expensive to do so on L1. Adopting this EIP, the signature verification of Falcon can be reduced to 1500 gas, and a similar result is expected for Dilithium. Adopting the hash function as a separate EIP would enable a gas verification cost of 2000 gas. This is in line with the ratio looking at SUPERCOP implementations.

Native Client Implementation

Two implementations have been developed for op-geth with four precompiled contracts at addresses 0x12, 0x13, 0x14, and 0x15:

Pure Go Implementation (nocgo) - Default:

  • Zero external dependencies
  • Gas costs: Falcon-512 (790 gas), Falcon-1024 (1,750 gas), ML-DSA NTT_FW (220 gas), NTT_INV (270 gas)
  • VECMULMOD: ceil(0.32 × N) gas, VECADDMOD: ceil(0.3 × N) gas

liboqs Implementation (cgo) - High Performance:

  • Uses liboqs library via CGO
  • 36-38% lower gas costs for NTT operations
  • Gas costs: Falcon-512 (500 gas), Falcon-1024 (1,080 gas), ML-DSA NTT_FW (256 gas), NTT_INV (340 gas)
  • Same VECMULMOD/VECADDMOD costs as nocgo variant

Both implementations support:

  • NTT_FW (0x12): Forward NTT transformation
  • NTT_INV (0x13): Inverse NTT transformation
  • VECMULMOD (0x14): Element-wise modular multiplication
  • VECADDMOD (0x15): Element-wise modular addition
  • Input format: ring_degree (4 bytes) + modulus (8 bytes) + coefficients (Falcon: uint16×N, ML-DSA: int32×N)

Benchmark results on Intel(R) Xeon(R) CPU @ 2.20GHz (liboqs implementation):

BenchmarkPrecompiledNTT_FW/NTT_FW-Falcon-512-Gas=500-4
    377011      9411 ns/op      500 gas/op        53.12 mgas/s
BenchmarkPrecompiledNTT_FW/NTT_FW-Falcon-1024-Gas=1080-4
    176936     20419 ns/op     1080 gas/op        52.89 mgas/s
BenchmarkPrecompiledNTT_FW/NTT_FW-ML-DSA-256-Gas=256-4
    740838      4785 ns/op      256 gas/op        53.49 mgas/s

BenchmarkPrecompiledNTT_INV/NTT_INV-Falcon-512-Gas=500-4
    372236      9374 ns/op      500 gas/op        53.33 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-Falcon-1024-Gas=1080-4
    176300     20316 ns/op     1080 gas/op        53.15 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-ML-DSA-256-Gas=340-4
    558224      6396 ns/op      340 gas/op        53.15 mgas/s

BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-Falcon-512-Gas=164-4
   1237585      2919 ns/op      164 gas/op        56.17 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-Falcon-1024-Gas=328-4
    592540      5999 ns/op      328 gas/op        54.67 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-ML-DSA-256-Gas=82-4
   1726333      1957 ns/op       82 gas/op        41.89 mgas/s

BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-Falcon-512-Gas=154-4
   1273167      2828 ns/op      154 gas/op        54.45 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-Falcon-1024-Gas=308-4
    606345      5829 ns/op      308 gas/op        52.84 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-ML-DSA-256-Gas=77-4
   2178388      1645 ns/op       77 gas/op        46.81 mgas/s

BenchmarkPrecompiledEcrecover/-Gas=3000-4
   65388      57182 ns/op     3000 gas/op        52.46 mgas/s

The benchmarks demonstrate that all NTT precompiles maintain competitive or better throughput compared to existing precompiled contracts like ECRECOVER, processing approximately 52-56 million gas per second. Both implementations (nocgo and cgo) provide efficient support for multiple post-quantum cryptographic schemes including Falcon-512/1024 and ML-DSA/Dilithium, with the liboqs variant offering ~36-38% lower gas costs for NTT operations.

End-to-End Signature Verification with Precompiles

Integration testing demonstrates the practical impact of NTT precompiles on full signature verification. Tests were conducted using modified ZKNOX implementations (ETHFALCON and ETHDILITHIUM) against an op-geth client with NTT precompile support.

Test Environment: OP-Stack testnet with NTT precompiles enabled.

Algorithm Verification Method Pure Solidity With Precompile Gas Saved
ETHFALCON (Falcon-512) Direct 1,800,000 479,341 1,320,659 (73.4%)
ETHDILITHIUM (ML-DSA) PKContract-based 9,746,410 7,618,412 2,127,998
ETHDILITHIUM (ML-DSA) Direct (struct) 7,874,354 5,732,354 2,142,000

Key Findings:

  • ETHFALCON: Achieves 73.4% gas reduction with NTT precompiles, reducing signature verification cost from 1.8M gas to under 480K gas.
  • ETHDILITHIUM: NTT precompiles save over 2.1M gas per signature verification.

These results validate that NTT precompiles significantly reduce gas costs for post-quantum signature verification, making lattice-based cryptography more practical for Ethereum applications.

Backwards Compatibility

There are no backward compatibility questions.

Test Cases

The reference implementation includes comprehensive test coverage across multiple layers:

NTT Precompile Tests (0x12)

Malformed Input Tests - 8 test cases covering:

  • Invalid operation codes (values other than 0 or 1)
  • Invalid ring degrees (non-power-of-2, < 16)
  • Zero or non-NTT-friendly moduli (not congruent to 1 mod 2N)
  • Coefficients exceeding modulus bounds
  • Input length mismatches

Functional Tests:

  • Forward NTT transformation with ring degree 16, modulus 97
  • Inverse NTT transformation ensuring INTT(NTT(x)) = x
  • Round-trip validation for data integrity

Crypto Standards Benchmarks:

  • Falcon-512: n=512, q=12289
  • Kyber-128: n=256, q=3329
  • Dilithium-256: n=256, q=8380417

Vector Operations Tests (0x13, 0x14)

Unified Malformed Input Tests - 7 test cases covering:

  • Invalid ring degrees for both VECMULMOD and VECADDMOD
  • Zero or non-NTT-friendly moduli
  • Input length mismatches (expecting 2*N vectors)
  • Coefficient bounds validation

Functional Tests:

  • Element-wise multiplication: result[i] = (a[i] * b[i]) mod q
  • Element-wise addition: result[i] = (a[i] + b[i]) mod q
  • Validation with small test vectors (ring degree 16, modulus 97)

Crypto Standards Benchmarks - Performance validation with:

  • Falcon-512 parameters (VECMULMOD: 164 gas, VECADDMOD: 154 gas)
  • Falcon-1024 parameters (VECMULMOD: 328 gas, VECADDMOD: 308 gas)
  • ML-DSA-256 parameters (VECMULMOD: 82 gas, VECADDMOD: 77 gas)

Benchmark data are available in the EIP assets:

Pure Go Implementation (nocgo):

liboqs Implementation (cgo):

Reference Implementation

All implementations have been validated over a large base of reference vectors, and implementing both FALCON and DILITHIUM algorithms as demonstration of the usefulness of the precompile.

Security Considerations

Needs discussion.

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

Renaud Dubois (@rdubois-crypto), Simon Masson (@simonmasson), Yoon Hyoung Lee (@yhl125), "EIP-7885: Precompile for NTT operations [DRAFT]," Ethereum Improvement Proposals, no. 7885, February 2025. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-7885.