This is the smallest possible change to the EVM to support calls and returns.
This proposal introduces three new control-flow instructions to the EVM:
CALLSUB transfers control to the destination on the stack.
ENTERSUB marks a CALLSUB destination.
RETURNSUB returns to the PC after the most recent CALLSUB.
Code can also be prefixed with MAGIC bytes. The complete control flow of MAGIC code can be traversed in time and space linear in the size of the code, enabling tools for validation, static analysis, and AOT and Jit compilers. On-chain, MAGIC code is validated at CREATE time to ensure that it cannot execute invalid instructions, jump to invalid locations, underflow stack, or, in the absence of recursion, overflow stack.
These changes are backwards-compatible: the new instructions behave as specified whether or not they appear in MAGIC code.
Motivation
The original control-flow facilities
In 1833 Charles Babbage began the design of a steam-powered, mechanical, Turing-complete computer. Its first published description was by L. F. Menabre, 1842. The translator, Ada Augusta, Countess of Lovelace, made extensive notes. The notes include her famous program for recursively computing the Bernoulli numbers, which required that the Engine support nested loops. Here also we find her prescient recognition of the Engine’s power:
The Analytical Engine … holds a position wholly its own … In enabling mechanism to combine together general symbols in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science.1
And her insistence on its limits:
The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths. Its province is to assist us in making available what we are already acquainted with.
In 1950 Alan Turing answered Lady Lovelace’s objection with a question of his own:
The majority of [minds] seem to be “subcritical” … an idea presented to such a mind will on average give rise to less than one idea in reply. A smallish proportion are supercritical. An idea presented to such a mind that may give rise to a whole “theory” consisting of secondary, tertiary and more remote ideas… we ask, “Can a machine be made to be supercritical?”2
Jumps, conditional jumps, calls, and returns
Jumps and conditional jumps were supported by the hardware of the Analytical Engine. Calls and returns required manual shuffling of punch card decks. In 1945 Turing proposed his Automatic Computing Engine, which supported calls directly with a ring of memory crystals holding return addresses:
A simple form of logical control would be a list of operations to be carried out in the order in which these are given. Such a scheme can be made to cover quite a number of jobs… and has been used in more than one machine… However, it lacks flexibility. We wish to be able to arrange that sequences of orders can divide at various points, continuing in different ways according to the outcome of the calculations to date… We also wish to be able to arrange for the splitting up of operations into subsidiary operations… To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation.3
Turing’s ACE was a 32-bit RISC machine with integer and floating point operations, 32 registers, a 1024-slot return stack, and 25K of RAM on a 1-MHz bus. The Pilot ACE was for a while the world’s fastest computer.
Similar call and return facilities of various levels of complexity – from Burroughs’ baroque ALGOL support to RISC-V’s subtly dynamic JAL and JALR – have proven their worth across a long line of important machines over the last 80 years. This includes most all of the machines we have programmed or implemented: physical machines including the Burroughs 5000, CDC 7600, IBM 360, PDP-11, VAX, Motorola 68000, Sun SPARC, Intel x86s, and others, as well as virtual machines for Scheme, Forth, Pascal, Java, Wasm, and others.
The EVM control-flow facility
Unlike these machines, the Ethereum Virtual Machine does not provide facilities for calls and returns. Instead, they must be synthesized using the JUMP instruction, which takes its argument on the stack. Further, the EVM provides only this jump. The EVM’s dynamic jump causes problems. First, the need to synthesize static jumps and calls with dynamic jumps wastes some space and gas, as we will show below. The much bigger problem is this: jumps that can dynamically branch to any destination in the program are a denial-of-service vulnerability that cause quadratic “path explosions” when traversing the program’s flow of control.
Going quadratic
Traversing flow of control is a fundamental first step for many static analyses, including validating the flow of control, proving that programs meet their formal specifications, and, for these purposes and others, transforming code into other representations, such as control flow graphs, code for faster interpreters, and code for physical machines. One of the simplest ways to traverse the control flow of a program is via the standard “depth-first search” (DFS) algorithm for searching a directed graph:
Traverse the code instruction by instruction:
When a branch or call is encountered:
Recurse to traverse its destinations:
When a destination is re-encountered:
Return from that recursion.
The algorithm is “depth-first” in that it recurses into the destinations of branches and calls before proceeding with the main loop. In the simple form above the algorithm is searching for cycles, but it can be extended to do much more. E.g. As a validation algorithm it can track stack height and search for underflows and overflows, as a code generator it can serialize virtual stack code directly into register code, and for static analysis it can extract the control flow graph.
When all jumps are static the number of steps is linear in the number of instructions: only one or two paths must be explored for each jump. But with dynamic jumps every possible destination must be explored at every jump: at worst, the number of steps is quadratic in the number of instructions.
Dynamic jumps must die
For Ethereum, quadratic traversal times are an online denial-of-service vulnerability.
Even offline, dynamic jumps can cause static analyses of many contracts to become impractically slow, intractable or even impossible:
“Ethereum smart contracts are distributed programs running on top of the Ethereum blockchain. Since program flaws can cause significant monetary losses and can hardly be fixed due to the immutable nature of the blockchain, there is a strong need of automated analysis tools which provide formal security guarantees. Designing such analyzers, however, proved to be challenging and error-prone.”4
“The EVM language is a simple stack-based language … with one significant difference between the EVM and other virtual machine languages (like Java Bytecode or CLI for .Net programs): the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. Static analyzers need the complete control flow graph (CFG) of the EVM program in order to be able to represent all its execution paths.”5
“Static analysis approaches mostly face the challenge of analysing compiled Ethereum bytecode… However, due to the intrinsic complexity of Ethereum bytecode (especially in jump resolution), static analysis encounters significant obstacles.”6
“Analyzing contract binaries is vital since their sources are unavailable, involving identification comprising function entry identification and detecting its boundaries… Unfortunately, it is challenging to identify functions … due to the lack of internal function call statements.”7
There is an entire academic literature of complex, incomplete solutions to problems that this proposal renders trivial.
With so much at stake on the blockchain deploying correct contracts is imperative – there is no reason for the EVM to make that job any more difficult than necessary.
Taming the EVM
To prevent control-flow traversal from “going quadratic” we must prevent the dynamic use of jumps. On-chain, most all uses of JUMP and JUMPI are preceded by a PUSH – that is, they are effectively static. This proposal ensures that in valid code JUMP and JUMPI are always used statically. Currently, the only places that jumps have to be used dynamically are to support calls and returns. For that purpose we propose CALLSUB and RETURNSUB opcodes as replacements.
Specification
The key words MUST and MUST NOT in this Specification are to be interpreted as described in RFC 2119 and RFC 8174.
CALLSUB (0x..)
Transfers control to a subsidiary operation.
Pop the destination on top on the stack.
Push the current PC + 1 to the return stack.
Set PC to destination.
The gas cost is mid (8).
ENTERSUB (0x..)
The destination of every CALLSUB MUST be an ENTERSUB.
RETURNSUB (0x..)
Returns control to the caller of a subsidiary operation.
Pop the return stack to PC.
The gas cost is low (5).
MAGIC (0xEF....)
After this EIP has been activated code beginning with the MAGIC bytes MUST be a valid program. Execution begins immediately after the MAGIC bytes.
Notes:
Values popped off the return stack do not need to be validated, since they are alterable only by CALLSUB and RETURNSUB.
The description above lays out the semantics of these instructions in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code CALLSUB to unobservably push PC on the return stack rather than PC + 1, which is allowed so long as RETURNSUB observably returns control to the PC + 1 location.)
Opcode and MAGIC values are still to be determined.
Costs
A mid cost for CALLSUB is justified by it taking very little more work than the mid cost of JUMP – just pushing an integer to the return stack
A jumpdest cost for ENTERSUB is justified by it being, like JUMPDEST, a mere label.
A low cost for RETURNSUB is justified by needing only to pop the return stack into the `PC – less work than a jump.
Benchmarking will be needed to tell if the costs are well-balanced.
Validity
Execution is defined in the Yellow Paper as a sequence of changes to the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state and cannot continue. The Yellow Paper defines six such states.
State modification during a static call
Insufficient gas
More than 1024 stack items
Insufficient stack items
Invalid jump destination
Invalid instruction
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state. In practice, we must test at runtime for the first three conditions. We dont know whether we will be called statically. We dont know how much gas there will be, and we dont know how deep a recursion may go. However, we can validate that non-recursive programs do not overflow stack. All of the remaining conditions MUST be validated statically. To allow for efficient algorithms our validation does not consider the codes data and computations, only its control flow and stack use. This means we will reject programs with invalid code paths, even if those paths are not reachable.
Constraints on valid code
Code beginning with MAGIC MUST be valid. Constraints on valid code MUST be validated at CREATE time, in time and space linear in the size of the code. The constraints on valid code are as follows.
All opcodes must be valid:
They MUST have been defined in the Yellow Paper or a deployed EIP and
MUST NOT have been deprecated in a subsequent deployed EIP.
The INVALID opcode is valid.
The JUMP and JUMPI instructions MUST be preceded by a PUSH instruction.
The JUMP and JUMPI instructions MUST NOT address immediate data, and MUST address a JUMPDEST.
The CALLSUB instruction MUST NOT address immediate data, and MUST address an ENTERSUB.
The number of items on the data stack MUST always be positive and less than or equal to 2048.
The number of items on the return stack MUST always be positive and less than or equal to 2048.
The stack height is the absolute difference between the current stack pointer and the stack pointer at the most recent ENTERSUB:
The stack height MUST be the same for every PC.
The guarantee of constant stack height prevents stack underflow, breaks cycles in the control flow, ensures finite stack use for non-recursive programs, and allows virtual stack code to be directly serialized into virtual register code for faster interpretation and compilation to machine code.
Note: .NET, the JVM, and Wasm enforce similar constraints for similar reasons.
Validation
The above is a purely semantic specification, placing no constraints on the syntax of bytecode beyond being an array of opcodes and immediate data. “Subsidiary operations” here are not contiguous sequences of bytecode. They are subgraphs of the bytecode’s full control-flow graph. The EVM is a simple state machine, where every instruction advances the state one more notch – it has no syntactic structure. We only promise that valid code will not, as it were, jam up the gears of the machine.
Rather than enforce semantic constraints via syntax – as done by higher-level languages – this proposal enforces them via validation: MAGIC code is proven valid at CREATE time. We provide an algorithm below for validating code in time and space linear in the size of the code. An equivalent proof MUST be provided at CREATE time.
With no syntactic constraints and minimal semantic constraints we maximize opportunities for optimizations, include tail call elimination, multiple-entry calls, common exit handlers, arranging code blocks for optimal caching and variables for efficient register allocation, and others. Since we want to support online compilation of EVM code to native code it is crucial that the EVM code be as well optimized as possible by our HLL compilers – upfront and offline.
Rationale
Why no static relative jumps?
This would of course break the promise of “the smallest possible change.” EIP-4200: EOF - Static relative jumps remains available if we want its size and performance advantages. It proposes three new instructions taking immediate data:
RJUMP destination transfers control to PC + destination.
RJUMPI destination conditionally transfers control to PC + destination.
RJUMPV jump-table transfers control relative to an embedded jump table.
If adopted, we would want RCALLSUB as well as, or perhaps rather than CALLSUB. And if adopted we would still need to deal with the existing JUMP opcodes. This proposal allows for more existing code to remain valid than the alternative of deprecating jumps in valid code.
Why no code sections?
Again, this would break the promise of “the smallest possible change,” and EIP-3540: EOF - EVM Object Format remains available. In the presence of EOF code sections CALLSUB could be restricted to section boundaries. But note that this restriction impedes important optimizations, which EOF supported by adding special opcodes. So if we want restricted code sections is becomes a much bigger change, and we should consider EOF - Functions and related EIPs separately.
Why the return-stack mechanism?
In our experience most stack machines have separate stacks for data and for returns from calls, whereas most register machines have one stack for return address, registers for computation, and instructions to support returns from calls in coordination with registers and the stack. The EVM is of course a stack machine, not a register machine. As an industry and a team we have substantial experience with the return-stack mechanism proposed here. It has been effectively used in many machines over the last eight decades, and has been implemented, tested, and even ready to ship in many of our clients over the last nine years.
Do we save on code size and gas?
The difference these instructions make can be seen in this very simple code for calling a routine that squares a number. The distinct opcodes make it easier for both people and tools to understand the code, and there are modest savings in code size and gas costs as well.
SQUARE: | SQUARE:
jumpdest ; 1 gas | entersub ; 1 gas
dup ; 3 gas | dup : 5 gas
mul ; 5 gas | mul ; 5 gas
swap1 ; 3 gas | returnsub ; 5 gas
jump ; 8 gas |
|
CALL_SQUARE: | CALL_SQUARE:
jumpdest ; 1 gas | entersub ; 1 gas
push RTN_CALL ; 3 gas | push 2 ; 3 gas
push 2 ; 3 gas | push SQUARE ; 3 gas
push SQUARE ; 3 gas | callsub ; 8 gas
jump ; 8 gas | returnsub ; 5 gas
RTN_CALL: |
swap1 ; 3 gas |
jump ; 8 gas |
|
Size in bytes; 18 | Size in bytes: 13
Consumed gas; 49 | Consumed gas: 38
That’s 32% fewer bytes and 30% less gas using CALLSUB versus using JUMP. So we can see that these instructions provide a simpler, more efficient mechanism. As code becomes larger and better optimized the gains become proportionally smaller, but code using CALLSUB always takes less space and gas than equivalent code without it.
Do we improve real-time performance?
Real time performance gains (as opposed to gas) will come from AOT and JIT compilers. Crucially, the constraint that stack depths be constant means that in MAGIC code a very fast JIT can traverse the control flow of the EVM code in one pass, generating machine code as it goes. (Wasm, the JVM and .NET share that property.) The EVM is a stack machine, but real machines are register machines. So generating virtual register code for a faster interpreter is a win, (I have seen 4X speedups on JVM code) and generating good machine code gives orders of magnitude gains. But for most transactions storage dominates execution time, and gas counting and other overhead take their toll. So these gains would be most visible in contexts where this overhead is absent, such as for L1 precompiles and on some L2s.
Why 2048?
Yes, we are proposing a doubling of the stack size from 32KB to 64KB – even a 1979 IBM PC had an Intel CPU with a 64KB stack segment. A larger increase might be justified – a 2025 Rasberry Pi has a quad-core Arm CPU with 512KB per-core L2 caches and a 2MB shared L3 cache.
Backwards Compatibility
These changes are backwards compatible.
There are no changes to the semantics of existing EVM code.
(With the caveat that code with unspecified behaviour might behave in different, unspecified ways. Such code was always broken.)
Opcode semantics are not affected by whether the code begins with MAGIC.
This proposal does not require maintaining two interpreters.
These changes do not foreclose EOF, RISC-V, or other changes; new MAGIC numbers would mark future EVMs. Neither do these changes preclude running the EVM in zero knowledge; they would more likely help.
Test Cases
** Note: these tests are known to be incorrect. **
Simple routine
This should jump into a subroutine, back out and stop.
This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0xffff instead of 0x0c.
Error: at pc=10, op=CALLSUB: invalid jump destination
Failure 2: shallow return stack
This should fail at first opcode, due to shallow return_stack
Bytecode: 0x5d5858 (RETURNSUB, PC, PC)
Pc
Op
Cost
Stack
RStack
0
RETURNSUB
5
[]
[]
Error: at pc=0, op=RETURNSUB: invalid retsub
Subroutine at end of code
In this example. the CALLSUB is on the last byte of code. When the subroutine returns, it should hit the ‘virtual stop’ after the bytecode, and not exit with error
** Note: this implementation is known to be incomplete and incorrect. **
The following is a pseudo-Python implementation of an algorithm for predicating code validity. An equivalent algorithm MUST be run at CREATE time in space and time linear in the size of the code. This algorithm recursively traces the code, emulating its control flow and stack use and checking for violations of the rules above. It runs in time proportional to the size of the code, and the depth of recursion is proportional to the size of the code.
Validation Function
We assume that instruction validation and destination analysis has been done, and that we have some constant-time helper functions:
is_terminator(opcode) returns true iff opcode is a terminator.
previous_data(pc) returns the immediate data for the instruction before pc (usually a PUSH.)
immediate_size(opcode) returns the size of the immediate data for an opcode.
removed_items(opcode) returns the number of items removed from the data_stack by the opcode.
added_items(opcode) returns the number of items added to the data_stack by the opcode.
# returns true iff code is valid
int []stack_depths
int []max_depths
def validate_code(code: bytes, pc: int, sp: int, bp: int, max: int) -> int, boolean:
while pc < len(code):
# check stack height and return if we have been here before
stack_depth = sp - bp
max_depth = max + stack_depth
if max_depth > 1024
return max_depth, false
if stack_depths[pc] {
if stack_depth != stack_depths[pc]:
return 0, false
if opcode == ENTERSUB:
return max_depths[pc], true
else
return max_depth, true
else:
stack_depths[pc] = stack_depth
if is_terminator(opcode):
return max_depth, true
elif opcode == CALLSUB:
# push return address and set pc to destination
jumpdest = previous_data(pc)
push(return_stack, pc)
# validate and track maximum height
max_depth, valid = validate_code(jumpdest, 0, sp - bp, max)
if !valid:
return max_depth, false
max_depths[jumpdest] = max_depth;
elif opcode == RETURNSUB:
# pop return address and check for preceding call
pc = pop(return_stack)
max_depth = max + stack_depth
return max_depth, true
if opcode == JUMP:
# set pc to destination of jump
pc = previous_data(pc)
elif opcode == JUMPI:
jumpdest = previous_data(pc)
# recurse to validate true side of conditional
max_depth, valid = validate_code(jumpdest, sp, bp)
if !valid:
return max_depth, false
# apply instructions to stack
sp -= removed_items(opcode)
if sp < 0
return so, false
sp += added_items(opcode)
# Skip opcode and any immediate data
pc += 1 + immediate_size(opcode)
max_depth = max + stack_depth
if (max_depth > 1024)
return max_depth, false
return max_depth, true
Security Considerations
These changes introduce new flow control instructions. They do not introduce any new security considerations. This EIP is intended to improve security by validating a higher level of safety for EVM code deployed on the blockchain. The validation algorithm must use time and space linear in the size of the code so as not be a denial of service vulnerability. The algorithm here makes one linear-time, recursive pass of the bytecode, whose depth cannot exceed the number of CALLSUB and JUMPI instructions in the code.
Menabre, L.F. Sketch of The Analytical Engine Invented by Charles Babbage. Bibliothque Universelle de Genve, No. 82, October 1842 ↩
Turing, A.M. Computing Machinery and Intelligence. Mind, Volume LIX, Issue 236, October 1950 ↩
Carpenter, B.E. et al. The other Turing machine. The Computer Journal, Volume 20, Issue 3, January 1977 ↩
Schneidewind, Clara et al. The Good, the Bad and the Ugly: Pitfalls and Best Practices in Automated Sound Static Analysis of Ethereum Smart Contracts. DOI: 10.48550/arXiv.2101.05735 ↩
Albert, Elvira et al. Analyzing Smart Contracts: From EVM to a sound Control-Flow Graph. DOI: 10.48550/arXiv.2004.14437 ↩
Contro, Fillipo et al. EtherSolve: Computing an Accurate Control-Flow Graph from Ethereum Bytecode. DOI: 10.48550/arXiv.2103.09113 ↩
He, Jiahao et al. Neural-FEBI: Accurate Function Identification in Ethereum Virtual Machine Bytecode. DOI: 10.48550/arxiv.2301.12695” ↩
Citation
Please cite this document as:
Greg Colvin (@gcolvin), Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede), John Max Skaller <skaller@internode.on.net>, "EIP-7979: Call and Return Opcodes for the EVM [DRAFT]," Ethereum Improvement Proposals, no. 7979, December 2025. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-7979.