EIP-2315: Simple Subroutines for the EVM Source

AuthorGreg Colvin, Martin Holst Swende
Discussions-Tohttps://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
StatusDraft
TypeStandards Track
CategoryCore
Created2019-10-17

Simple Summary

(Almost) the smallest possible change that provides native subroutines without breaking backwards compatibility.

Abstract

This proposal introduces three opcodes to support subroutines: BEGINSUB, JUMPSUB and RETURNSUB. (The smallest possible change would do without BEGINSUB).

Motivation

The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. In the EVM the return

Facilities to directly support subroutines are provided in some form by most physical and virtual machines going back at least fifty years. In whatever form, these operations provide for capturing the current context of execution, transferring control to a new context, and returning to the original context.

We propose a simple return-stack mechanism, known to work well for stack machines, which we specify here. Note that this specification is entirely semantic. It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.

In the future, amenability to static analysis equivalent to EIP-615 could be ensured by enforcing a few simple rules, and validated with the provided algorithm, still without imposing syntactic constraints.

Specification

We introduce one more stack into the EVM in addition to the existing data stack which we call the return stack. The return stack is limited to 1024 items.

BEGINSUB

Marks the entry point to a subroutine. Execution of a BEGINSUB is a no-op.

JUMPSUB

Transfers control to a subroutine.

  1. Pop the location off the data stack.
  2. If the opcode at location is not a BEGINSUB abort.
  3. If the return stack already has 1024 items abort.
  4. Push the current pc + 1 to the return stack.
  5. Set pc to location + 1.
  • pops one item off the data stack
  • pushes one item on the return stack

RETURNSUB

Returns control to the caller of a subroutine.

  1. If the return stack is empty abort.
  2. Pop pc off the return stack.
  • pops one item off the return stack

Note 1: If a resulting pc to be executed is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.

Note 2: Values popped off the return stack do not need to be validated, since they are alterable only by JUMPSUB and RETURNSUB.

Note 3: The description above lays out the semantics of this feature in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code JUMPSUB to unobservably push pc on the return stack rather than pc + 1, which is allowed so long as RETURNSUB observably returns control to the pc + 1 location.)

Indirect Jumps

If EIP-2327: BEGINDATA or similar is implemented then the indirect jumps from EIP-615JUMPV and JUMPSUBV – can be implemented. These could take two arguments on the stack: a constant offset relative to BEGINDATA to a jump table, and a variable index into that table.

Rationale

We modeled this design on the simple, proven, archetypal Forth virtual machine of 1970. It is a two-stack design – the data stack is supplemented with a return stack to support jumping into and returning from subroutines, as specified above. The separate return stack ensures that the return address cannot be overwritten or mislaid, and obviates any need to swap the return address past the arguments on the stack. Importantly, a dynamic jump is not needed to implement subroutine returns, allowing for deprecation of dynamic uses of JUMP and JUMPI. Eventually deprecating dynamic jumps is key to practical static analysis of code.

(JUMPSUB and RETURNSUB were also defined in terms of a return stack in EIP-615) .

Backwards and Forwards Compatibility

These changes do not affect the semantics of existing EVM code.

These changes are compatible with using EIP-3337 to provide stack frames, by associating a frame with each subroutine.

Implementations

Three clients have implemented this (or an earlier version of this) proposal:

Costs and Codes

We suggest that the cost of

  • BEGINSUB be jumpdest (1)
  • JUMPSUB be high (10)
    • This is the same as JUMPI, and 2 more than JUMP.
  • RETURNSUB be low (5).

Benchmarking might be needed to tell if the costs are well-balanced.

We suggest the following opcodes:

0x5c BEGINSUB
0x5d RETURNSUB
0x5e JUMPSUB

Security Considerations

These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code needs to be modified accordingly. The JUMPSUB semantics are similar to JUMP (but jumping to a BEGINSUB), whereas the RETURNSUB instruction is different, since it can ‘land’ on any opcode (but the possible destinations can be statically inferred).

The safety and amenability to static analysis of valid programs can be made comparable to EIP-615, but without imposing syntactic constraints, and thus with minimal impact on low-level optimizations. Validity can ensured by following the rules given in the next section, and programs can be validated with the provided algorithm. The validation algorithm is simple and bounded by the size of the code, allowing for validation at deploy time or at load time.

While it is crucial going forward that it be possible to validate programs, this EIP does propose that validity be enforced. Note that much value for people doing static analysis (e.g. for proofs that bytecode meets formal specifications of a contract) can be had without enforcement. Code can be scanned in linear time to ensure that the rules are or are not followed before analysis begins. And compilers can easily follow the rules up front.

Validity

Exceptional Halting States

Execution is as defined in the Yellow Paper—a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines five such states.

  1. Insufficient gas
  2. More than 1024 stack items
  3. Insufficient stack items
  4. Invalid jump destination
  5. Invalid instruction

We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate code in linear time to avoid denial of service attacks. So in practice, we can only partially meet these requirements. Our validation algorithm does not consider the code’s data and computations, only its control flow and stack use. This means we will reject programs with any invalid code paths, even if those paths are not reachable at runtime. Further, conditions 1 and 2 —Insufficient gas and stack overflow—must in general be checked at runtime. Conditions 3, 4, and 5 cannot occur if the code conforms to the following rules.

The Rules

  1. JUMP and JUMPI address only valid JUMPDEST instructions.
  2. JUMPSUB addresses only valid BEGINSUB instructions.
  3. JUMP, JUMPI and JUMPSUB are always preceded by one of the PUSH instructions.
  4. For each instruction in the code the stack depth is always the same.
  5. The stack depth is always positive and at most 1024.

Rules 1 and 2 are currently enforced at runtime. Note: Valid instructions are not part of PUSH data.

Rule 3, requiring a PUSH before each JUMP* would forbid dynamic jumps. Absent dynamic jumps another mechanism is needed for subroutine returns, as provided here.

For rules 4 and 5 we need to define stack depth. The Yellow Paper has the stack pointer or SP pointing just past the top item on the data stack. We define the stack base as where the SP pointed at the most recent JUMPSUB, or 0 on program entry. So we can define the stack depth as the number of stack elements between the current SP and the current stack base.

Given our definition of stack depth Rule 4 ensures that control flows which return to the same place with a different stack depth are invalid. These can be caused by irreducible paths like jumping into loops and subroutines, and calling subroutines with different numbers of arguments. Taken together, these rules allow for code to be validated by following the control-flow graph, traversing each edge only once.

Finally, Rule 5 precludes all stack underflows (and some stack overflows.)

Validation

The following is a pseudo-Go specification of an algorithm for enforcing program validity. It recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above. (For simplicity we ignore the issue of JUMPDEST or BEGINSUB bytes in PUSH data.) It runs in time == O(vertices + edges) in the program’s control-flow graph, where vertices represent control-flow instructions and the edges represent basic blocks.

   var bytecode []byte
   var stack_depth []int
   var SP := 0

   func validate(PC :=0) boolean {
      // traverse code sequentially, recurse for subroutines and conditional jumps
      while true {
         instruction = bytecode[PC]
         if is_invalid(instruction) {
            return false;
         }

         // if stack depth non-zero we have been here before 
         // check for constant depth and return to break cycle
         if stack_depth[PC] != 0 {
             if SP != stack_depth[PC] {
                 return false
             } 
             return true
         }
         stack_depth[PC] = SP

         // effect of instruction on stack
         SP -= removed_items(instruction)
         SP += added_items(instruction)
         if SP < 0 || 1024 < SP {
             return false
         }

         // successful validation of path
         if instruction == STOP, RETURN, or SUICIDE {
             return true
         }

         if instruction == JUMP {

             // check for constant and correct destination
             if (bytecode[PC - 33] != PUSH32) {
                 return false
             }
             PC = stack[PC-32]
             if byte_code[PC] != JUMPDEST {
                 return false
             }

             // reset PC to destination of jump 
             PC = stack[PC-32]
             continue
         }
         if instruction == JUMPI {

             // check for constant and correct destination
             if (bytecode[PC - 33] != PUSH32) {
                 return false
             }
             PC = stack[PC-32]
             if byte_code[PC] != JUMPDEST {
                 return false
             }
             // recurse to jump to code to validate 
             if !validate(stack[SP])) {
                 return false
             }
             continue 
         }
         if instruction == JUMPSUB {

            // check for constant and correct destination
            if (bytecode[PC - 33] != PUSH32) 
               return false
             prevPC = PC
             PC = stack[PC-32]
             if byte_code[PC] != BEGINSUB {
                 return false
            }

             // recurse to jump to code to validate
             prevSP = SP
             depth = SP - prevSP
             SP = depth
             if  !validate(stack[SP]+1)) {
                 return false
             }
             SP = prevSP - depth + SP
             PC = prevPC
             continue
         }
         if instruction == RETURNSUB {
             PC = prevPC
             return true
         }

         // advance PC according to instruction
         PC = advance_pc(PC, instruction)
      }    
   }

Test Cases

Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x60045e005c5d (PUSH1 0x04, JUMPSUB, STOP, BEGINSUB, RETURNSUB)

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMPSUB 10 [4] []
5 RETURNSUB 5 [] [ 2]
3 STOP 0 [] []

Output: 0x Consumed gas: 18

Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: 0x6800000000000000000c5e005c60115e5d5c5d (PUSH9 0x00000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB)

Pc Op Cost Stack RStack
0 PUSH9 3 [] []
10 JUMPSUB 10 [12] []
13 PUSH1 3 [] [10]
15 JUMPSUB 10 [17] [10]
18 RETURNSUB 5 [] [10,15]
16 RETURNSUB 5 [] [10]
11 STOP 0 [] []

Consumed gas: 36

Failure 1: invalid jump

This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0x01000000000000000c instead of 0x0c.

Bytecode: 0x6801000000000000000c5e005c60115e5d5c5d (PUSH9 0x01000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB)

Pc Op Cost Stack RStack
0 PUSH9 3 [] []
10 JUMPSUB 10 [18446744073709551628] []
Error: at pc=10, op=JUMPSUB: invalid jump destination

Failure 2: shallow return stack

This should fail at first opcode, due to shallow return_stack

Bytecode: 0x5d5858 (RETURNSUB, PC, PC)

Pc Op Cost Stack RStack
0 RETURNSUB 5 [] []
Error: at pc=0, op=RETURNSUB: invalid retsub

Subroutine at end of code

In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the ‘virtual stop’ after the bytecode, and not exit with error

Bytecode: 0x6005565c5d5b60035e (PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB)

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMP 8 [5] []
5 JUMPDEST 1 [] []
6 PUSH1 3 [] []
8 JUMPSUB 10 [3] []
4 RETURNSUB 5 [] [ 8]
9 STOP 0 [] []

Consumed gas: 30

References

Gavin Wood, Ethereum: A Secure Decentralized Generalized Transaction Ledger, 2014-2021 Greg Colvin, Brooklyn Zelenka, Paweł Bylica, Christian Reitwiessner, EIP-615: Subroutines and Static Jumps for the EVM, 2016-2019 Martin Lundfall, EIP-2327: BEGINDATA Opcode, 2019 Nick Johnson, EIP-3337: Frame pointer support for memory load and store operations, 2021

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

Greg Colvin, Martin Holst Swende, "EIP-2315: Simple Subroutines for the EVM [DRAFT]," Ethereum Improvement Proposals, no. 2315, October 2019. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-2315.