This EIP proposes the addition of new optimized modular addition, subtraction and multiplication opcodes to the EVM. These support odd moduli up to 4096 bits in size.
Motivation
Benefits of the changes proposed in this EIP:
enables elliptic curve arithmetic operations on various curves including BLS12-381 to be implemented as EVM contracts
For operations on values up to 256bits in size, reduces gas cost per operation by 90-95% compared to the current MULMOD and ADDMOD opcodes.
for all cases where modexp precompile is useful, it could now be implemented as an EVM contract.
enables substantial cost reductions for algebraic hash functions (e.g. MiMC/Poseidon), zkp verification in the EVM.
Specification
Overview
During contract execution, a contract calls a setup instruction SETUPX, sourcing a modulus from a specified memory offset/size and computing several parameters used to speed up modular multiplication (referred to as “Montgomery” parameters). A zeroed memory space (whose size is a stack parameter passed to SETUPX) is allocated separate from EVM memory.
The modulus, computed parameters and memory space are associated with the current call frame state and referred to as the active modulus state. If SETUPX is called again to switch to a different modulus, the memory space and Montgomery parameters of the previous active modulus state remain allocated (the memory spaces of active/previously-active modulus state are separate).
New store and load opcodes STOREX/LOADX are used to copy multiples values to/from EVM memory and the memory space of the active modulus state.
Arithmetic is performed with ADDMODX/SUBMODX/MULMODX opcodes which take and return no stack items, require a 3-byte immediate value appended to the opcode.
The immediate is interpreted as 3 1-byte values z, x, y which are indexes to the array of EVMMAX values that comprise the memory space of the active modulus state.
An arithmetic operation is performed on inputs at index x/y placing the result in index z.
Conventions
x === y % m: x % m == y % m
pow(x, -1, m): The modular multiplicative inverse of x with respect to modulus m.
Opcode definition syntax is formatted as mneumonic {immediate - type} {immediate2 - type} ...: stack_arg_1, stack_arg_2, ... where immediates are listed in the order that they proceed the opcode and stack arguments are ordered starting at the top of the stack.
In the provided pseudocode, it is assumed that opcode gas charging logic is executed prior to execution logic.
Any exception thrown should immediately end the current execution frame and return to the caller.
Constants
Name
Value
Description
STOREX_BASE_GAS
3
base gas cost for STOREX opcode
LOADX_BASE_GAS
3
base gas cost for LOADX opcode
SETUPX_BASE_GAS
3
base gas cost for SETUPX opcode
EVMMAX_MAX_MEM
65,536 bytes
maximum amount of EVMMAX memory that can be used in a call frame
MAX_MOD_SIZE
4096 bits
tentative modulus size limit (can probably be removed because EVMMAX_MAX_MEM_SIZE effectively caps the modulus size)
MULMODX_SUBQUADRATIC_START
50
modulus size in multiples of 8 bytes where we switch to subquadratic mulmont cost model
SYSTEM_WORD_SIZE_BITS
varies depending on the system
word size in bits of a client’s CPU
Context Variables
Name
Type
Meaning
evmmax_state
EVMMAXState
a variable representing ephemeral state which exists for the duration of the current call and in the scope of the current call frame
evm_memory
bytes
EVM memory for the current call context
expand_evm_memory
func(size_words: int)
expands EVM memory by size_words * 32 bytes
cost_evm_memory_expansion
func(new_size_evm_words: int) -> int
EVM memory expansion cost function, modified according to this EIP
evm_stack
object
Allows access to the stack via pop() and peek(n) which return int stack elements
contract_code
bytes
code of the currently-executing contract
pc
int
EVM program counter
class EVMMAXState():
def __init__(self):
# ModState currently being used
self.active_mod_state = None
# a lookup of mod_id (int) -> ModState
self.mods = {}
class ModState():
def __init__(self, mod: int, num_vals_used: int, mod: int, r: int, r_squared: int, mod_inv_full=None, mod_inv=None):
self.mod = mod
# size (expressed in multiples of 8 bytes) needed to represent mod
self.val_size_multiplier = math.ceil(len(hex(mod)[2:]) / (2 * 8))
self.num_vals_used = num_vals_used
self.mod_inv = mod_inv
self.mod_inv_full = mod_inv_full
self.r = r
self.r_squared = r_squared
# a memory space of size num_vals_used * val_size_multiplier
self.values = [0] * self.num_vals_used
Helpers
# -----------------------------------------------------------------------------
# gas-charging helpers
def cost_precompute_mont(val_size_multiplier: int) -> int:
PRECOMPUTE_MONT_LO_GAS_A = ?
PRECOMPUTE_MONT_LO_GAS_B = ?
PRECOMPUTE_MONT_HI_GAS_A = ?
PRECOMPUTE_MONT_HI_GAS_B = ?
cost = 0
if val_size_multiplier < MULMODX_SUBQUADRATIC_START:
cost = math.ceil(PRECOMPUTE_MONT_LO_GAS_A * val_size_multiplier + \
PRECOMPUTE_MONT_LO_GAS_B)
else:
cost = math.ceil(PRECOMPUTE_MONT_HI_GAS_A * val_size_multiplier + \
PRECOMPUTE_MONT_HI_GAS_B)
return cost
def cost_addmodx(val_size_multiplier: int) -> int:
ADDMODX_GAS_A = 0.20
ADDMODX_GAS_B = 0.15
cost = 0
if val_size_multiplier == 6:
cost = 1
else:
cost = round(ADDMODX_GAS_A * limb_count + ADDMODX_GAS_B)
if cost == 0:
cost = 1
return cost
def cost_mulmodx(val_size_multiplier: int) -> int:
MULMODX_LO_GAS_A = 0.090
MULMODX_LO_GAS_B = 0
MULMODX_LO_GAS_C = 0.24
MULMODX_HI_GAS_A = 0
MULMODX_HI_GAS_B = 10.0
MULMODX_HI_GAS_C = -270.0
cost = 0
if val_size_multiplier == 6:
cost = 2
elif val_size_multiplier < MULMODX_SUBQUADRATIC_START:
cost = math.ceil(MULMODX_LO_GAS_A * (val_size_multiplier ** 2) + \
MULMODX_LO_GAS_B * val_size_multiplier + \
MULMODX_LO_GAS_C)
else:
cost = math.ceil(MULMODX_HI_GAS_A * val_size_multiplier ** 2 + \
MULMODX_HI_GAS_B * val_size_multiplier + \
MULMODX_HI_GAS_C)
if cost == 0:
cost = 1
return cost
# -----------------------------------------------------------------------------
# bigint helpers
# a bigint is a unsigned number represented as a list of unsigned system words in descending order of significance
# split a double-width value into hi/low words
def hi_lo(double_width_val: int) -> (int, int):
base = 2**SYSTEM_WORD_SIZE_BITS
assert double_width_val < base**SYSTEM_WORD_SIZE_BITS, "val must fit in two words"
return (double_width_val >> SYSTEM_WORD_SIZE_BITS) % base, double_width_val % base
def bigint_to_int(x: [int]) -> int:
res = 0
for i in reversed(range(len(x))):
res += x[i] * 2**(SYSTEM_WORD_BITS * (len(x) - i - 1))
return res
def int_to_bigint(x: int, word_count: int):
res = [0] * word_count
for i in range(word_count):
res[word_count - i - 1] = x & (2**SYSTEM_WORD_BITS - 1)
x >>= SYSTEM_WORD_BITS
return res
# return x - y (omitting borrow-out)
def bigint_sub(x: [int], y: [int]) -> [int]:
num_words = len(x)
res = [0] * num_words
c = 0
for i in reversed(range(num_words)):
c, res[i] = sub_with_borrow(x[i], y[i], c)
return res
# return x >= y
def bigint_gte(x: [int], y: [int]) -> bool:
for (x_word, y_word) in list(zip(x,y)):
if x_word > y_word:
return True
elif x_word < y_word:
return False
# x == y
return True
# CIOS Montgomery multiplication algorithm
#
# input:
# * x, y, mod - bigint inputs of `val_size_multiplier` length. the most significant limb of the modulus cannot be zero.
# * mod_inv - pow(-mod, -1, 2**SYSTEM_WORD_SIZE_BITS)
# requires:
# * x < mod and y < mod
# * mod_int % 2 != 0
# * mod[0] != 0
# returns:
# (x * y * pow(2**(SYSTEM_WORD_SIZE_BITS * val_size_multiplier), -1, mod)) % mod represented as a bigint
# note: references to x_int/y_int/mod_int/t_int refer to the python int representation of the corresponding bigint variable
def mulmont_quadratic(x: [int], y: [int], mod: [int], modinv: int) -> [int]:
assert len(x) == len(y) and len(y) == len(mod), "{}, {}, {}".format(x, y, mod)
assert mod[0] != 0, "modulus must occupy all words"
word_count = len(mod)
t = [0] * (word_count + 2)
for i in reversed(range(word_count)):
# first inner-loop: t <- t + x_int * y[i]
c = 0
for j in reversed(range(word_count)):
c, t[j + 2] = hi_lo(t[j + 2] + x[j] * y[i] + c)
t[0], t[1] = hi_lo(t[1] + c)
m = (modinv * t[-1]) % BASE
c, _ = hi_lo(m * mod[-1] + t[-1])
# second inner-loop:
# 1. t_int <- t_int + modinv * mod_int * t[-1]
# 2. t_int <- t_int // (2**SYSTEM_WORD_SIZE)
# note:
# after step 1:
# * modinv * mod_int * t[-1] === -1 % (2**SYSTEM_WORD_SIZE_BITS)
# * t_int === (t_int + (-1) t_int) % (2**SYSTEM_WORD_SIZE_BITS) === 0 % (2**SYSTEM_WORD_SIZE_BITS)
# so the shift in step 2 is a word-sized right shift.
# Steps 1 and 2 are combined and the shift is implicit.
for j in reversed(range(1, word_count)):
c, t[j + 2] = hi_lo(t[j + 1] + mod[j - 1] * m + c)
hi, t[2] = hi_lo(t[1] + c)
t[1] = t[0] + hi
# t_int = (t_int + t_int * mod_int * pow(-(2**(SYSTEM_WORD_SIZE_BITS*len(mod))), -1, mod_int)) // (2 ** (len(mod) * SYSTEM_WORD_SIZE_BITS))
# 0 < t_int < 2 * mod_int
t = t[1:]
if t[0] != 0:
# result occupies len(mod) + 1 words so it must be greater than modulus
return bigint_sub(t, [0] + mod)[1:]
elif bigint_gte(t[1:], mod):
return bigint_sub(t[1:], mod)
else:
return t[1:]
# subquadratic mulmont: same general algorithm as mulmont_quadratic with the assumption
# that any multiplications will be performed using Karatsuba subquadratic multiplication algorithm
# input:
# x, y, mod (int) - x < mod and y < mod
# mod (int) - an odd modulus
# R (int) - a power of two, and greater than mod
# mod_inv (int) - pow(-mod, -1, R)
# output:
# (x * y * pow(R, -1, mod)) % mod
#
def mulmont_subquadratic(x: int, y: int, mod: int, mod_inv_full: int, R: int) -> int:
T = x * y
m = ((T % R) * mod_inv_full) % R
T = T + m * mod
T /= R
if T >= mod:
T -= mod
return T
def mulmont(mod_state: ModState, x: int, y: int) -> int:
if mod_state.val_size_multiplier >= MULMODX_SUBQUADRATIC_START:
return mulmont_subquadratic(x, y, mod_state.mod, mod_state.mod_inv)
else:
x_bigint = int_to_bigint(x, (mod_state.val_size_multiplier * 64) // SYSTEM_WORD_SIZE_BITS)
y_bigint = int_to_bigint(y, (mod_state.val_size_multiplier * 64) // SYSTEM_WORD_SIZE_BITS)
mod_bigint = int_to_bigint(mod_state.mod)
return bigint_to_int(mulmont_quadratic(x_bigint, y_bigint, mod_bigint, mod_state.mod_inv_full, mod_state.r))
New Opcodes
Mneumonic
Opcode
Immediate size (bytes)
Stack in
Stack out
SETUPX
0x21
0
4
0
ADDMODX
0x22
3
0
0
SUBMODX
0x23
3
0
0
MULMODX
0x24
3
0
0
LOADX
0x25
0
3
0
STOREX
0x26
0
3
0
SETUPX
SETUPX : mod_id, mod_offset, mod_size, vals_used
Gas Charging
mod_id = evm.stack.peek(0)
mod_offset = evm_stack.peek(1)
mod_size = evm_stack.peek(2)
vals_used = evm_stack.peek(3)
cost = SETUPX_BASE_GAS
if mod_id in evmmax_state.mods:
# the modulus state keyed by mod_id was already active in this call-frame.
# no additional charge beyond SETUPX_BASE_GAS
return
if vals_used > 256:
raise Exception("cannot use more than 256 values for a given mod_id")
if mod_offset + mod_size > len(evm_memory):
raise Exception("cannot load a modulus that would extend beyond the bounds of EVM memory")
val_size_multiplier = math.ceil(mod_size / 8)
cost += cost_precompute_mont(val_size_multiplier)
cost += cost_evm_memory_expansion(math.ceil((num_vals_used * val_size_multiplier * 8) / 32))
Execution
mod_id = stack.pop()
mod_offset = stack.pop()
mod_size = stack.pop()
vals_used = stack.pop()
mod_inv = None
if mod_id in evmmax_state.mods[mod_id]:
# this mod state was previously used in this call frame.
# the associated montgomery parameters and memory space are already allocated.
# mark mod_id as the current active modulus state
evmmax_state.active_mod_state = evmmax_state.mods[mod_id]
return
val_size_multiplier = math.ceil(mod_size / 8)
mod = int.from_bytes(evm_memory[mod_offset:mod_offset+val_size], byteorder='big')
if mod == 0 or mod % 2 == 0:
raise Exception("modulus must be nonzero and odd")
if val_size_multiplier >= MULMODX_SUBQUADRATIC_START:
mod_inv_full = pow(-r, -1, mod)
else:
mod_inv = pow(-mod, -1, 2**SYSTEM_WORD_SIZE_BITS)
r = 2**(SYSTEM_WORD_SIZE_BITS * val_size_multiplier)
r_squared = r**2 % mod
mod_state = ModState(mod, val_size, r, r_squared, mod_inv_full=mod_inv_full, mod_inv=mod_inv)
evmmax_state.mods[mod_id] = mod_state
evmmax_state.active_mod_state = mod_state
LOADX
LOADX: dst_offset, val_idx, num_vals
Description
Load EVMMAX values in the current active modulus state to EVM memory.
mod_state = evmmax_state.active_modulus
if mod_state == None:
raise Exception("no mod state set")
z_offset = int(contract_code[pc+1:pc+2])
x_offset = int(contract_code[pc+2:pc+3])
y_offset = int(contract_code[pc+3:pc+4])
if x_offset >= mod_state.num_vals_used or y_offset >= mod_state.num_vals_used or z_offset >= mod_state.num_vals_used:
raise Exception("out of bounds value reference")
mod_state.values[z_offset] = mulmont(mod_state, mod_state.values[x_offset], mod_state.values[y_offset])
Changes to Contract Execution
EVM Memory Expansion Cost Function
Any EVM operation which expands memory x bytes will charge to expand memory to cur_evm_mem_size + x + evmmax_mem_size bytes where evmmax_mem_size is the size of all allocated EVMMAX values in the current call context (the sum of the values used by each mod_id that has been previously/currently set with SETUPX).
Jumpdest Analysis
Jumpdest analysis is modified to disallow jumps into immediate data for ADDMDOX/SUBMODX/MULMODX.
Rationale
Montgomery Modular Multiplication
EVMMAX values are stored internally in Montgomery form. Expressing values in Montgomery form enables the use of Montgomery reduction in modular multiplication which gives a substantial performance gain versus naive modular multiplication.
Modular addition and subtraction on Montgomery form values is computed the same as normal.
Memory Alignment for EVMMAX Values
LOADX/STOREX move 64bit-aligned big-endian values to/from the memory space of the active modulus state. SETUPX memory expansion pricing is tuned to assume that values will be stored in a as 64bit-aligned values in their EVMMAX memory space.
This choice is made to keep EVMMAX memory aligned to ensure performance.
Gas Costs
Gas models assume a rate of 1 gas per 25ns of execution time.
ADDMODX/SUBMODX/MULMODX
ADDMODX and SUBMODX can each be implemented using a single extended-precision addition, and single extended precision subtraction. This justifies a linear cost model.
MULMODX runtime scales quadratically with input size. After a certain threshold, the quadratic complexity of mulmont_quadratic dominates and it becomes more performant to use mulmont_subquadratic. Thus, there is a segmented cost model to reflect different asymptotic behavior between quadratic/subquadratic mulmont.
ADDMODX/SUBMODX/MULMODX pricing includes the cost of arithmetic and latency of accessing input values from CPU cache.
The price model assumes that the implementation will be generic for most bitwidths with the exception of 321-384bits which is priced aggressively.
LOADX/STOREX
These perform conversion to/from Montgomery and canonical forms for each value copied (a single mulmont per value converted). The overhead of memory loading/copying is covered by cost_mulmontx.
SETUPX
Backwards Compatibility
Jumpdest analysis changes in this EIP could potentially break existing contracts where a jump destination occurs in the 3 bytes proceeding a 0x22/0x23/0x24. This is unlikely to affect many existing contracts. Further analysis of deployed contract bytecode can determine with certainty, which (if any) contracts could be broken.