EIP 2315: Simple Subroutines for the EVM Source

AuthorGreg Colvin ([email protected]), Martin Holst Swende
Discussions-Tohttps://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
StatusDraft
TypeStandards Track
CategoryCore
Created2019-10-17

Abstract

This proposal introduces three opcodes to support subroutines: BEGINSUB, JUMPSUB and RETURNSUB.

Motivation

The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by contriving to get the return address back to the top of stack and jumping back to it. Complex calling conventions are then needed to use the same stack for computation and control flow. Memory allows for simpler conventions but still costs gas. Eschewing subroutines in user code is the least costly – but also the most failure-prone.

Over the course of 30 years the computer industry struggled with this complexity and cost and settled in on opcodes to directly support subroutines. These are provided in some form by most all physical and virtual machines going back at least 50 years.

Our design is modeled on the original Forth two-stack machine of 1970. The data stack is supplemented with a return stack to provide simple support for subroutines, as specified below.

In the Appendix we show example solc output for a simple program that uses over three times as much gas just calling and returning from subroutines as comparable code using these opcodes. Actual differences in run-time efficiency will of course vary widely.

Specification

We introduce one more stack into the EVM in addition to the existing data stack which we call the return stack. The return stack is limited to 1023 items.

BEGINSUB

Marks the entry point to a subroutine. Attempted execution of a BEGINSUB causes an abort: terminate execution with an OOG (Out Of Gas) exception.

JUMPSUB

Transfers control to a subroutine.

  1. Pop the location off the data stack.
  2. If the opcode at location is not a BEGINSUB abort.
  3. If the return stack already has 1023 items abort.
  4. Push the current pc + 1 to the return stack.
  5. Set pc to location + 1.
  • pops one item off the data stack
  • pushes one item on the return stack

RETURNSUB

Returns control to the caller of a subroutine.

  1. If the return stack is empty abort.
  2. Pop pc off the return stack.
  • pops one item off the return stack

Note 1: If a resulting pc to be executed is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.

Note 2: Values popped off the return stack do not need to be validated, since they are alterable only by JUMPSUB and RETURNSUB.

Note 3: The description above lays out the semantics of this feature in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code JUMPSUB to unobservably push pc on the return stack rather than pc + 1, which is allowed so long as RETURNSUB observably returns control to the pc + 1 location.)

Rationale

This is the is a small change that provides native subroutines without breaking backwards compatibility.

Backwards Compatibility

These changes do not affect the semantics of existing EVM code.

Test Cases

Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x60045e005c5d (PUSH1 0x04, JUMPSUB, STOP, BEGINSUB, RETURNSUB)

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMPSUB 10 [4] []
5 RETURNSUB 5 [] [ 2]
3 STOP 0 [] []

Output: 0x Consumed gas: 18

Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: 0x6800000000000000000c5e005c60115e5d5c5d (PUSH9 0x00000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB)

Pc Op Cost Stack RStack
0 PUSH9 3 [] []
10 JUMPSUB 10 [12] []
13 PUSH1 3 [] [10]
15 JUMPSUB 10 [17] [10]
18 RETURNSUB 5 [] [10,15]
16 RETURNSUB 5 [] [10]
11 STOP 0 [] []

Consumed gas: 36

Failure 1: invalid jump

This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0x01000000000000000c instead of 0x0c.

Bytecode: 0x6801000000000000000c5e005c60115e5d5c5d (PUSH9 0x01000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB)

Pc Op Cost Stack RStack
0 PUSH9 3 [] []
10 JUMPSUB 10 [18446744073709551628] []
Error: at pc=10, op=JUMPSUB: invalid jump destination

Failure 2: shallow return stack

This should fail at first opcode, due to shallow return_stack

Bytecode: 0x5d5858 (RETURNSUB, PC, PC)

Pc Op Cost Stack RStack
0 RETURNSUB 5 [] []
Error: at pc=0, op=RETURNSUB: invalid retsub

Subroutine at end of code

In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the ‘virtual stop’ after the bytecode, and not exit with error

Bytecode: 0x6005565c5d5b60035e (PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB)

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMP 8 [5] []
5 JUMPDEST 1 [] []
6 PUSH1 3 [] []
8 JUMPSUB 10 [3] []
4 RETURNSUB 5 [] [ 8]
9 STOP 0 [] []

Consumed gas: 30

Error on “walk-into-subroutine”

In this example, the code ‘walks’ into a subroutine, which is not allowed, and causes an error

Bytecode: 0x5c5d00 (BEGINSUB, RETURNSUB, STOP)

Pc Op Cost Stack RStack
0 BEGINSUB 2 [] []
Error: at pc=0, op=BEGINSUB: invalid subroutine entry

Note 5: The content of the error message, (invalid subroutine entry) is implementation-specific.

Implementations

Three clients have implemented this (or an earlier version of) this proposal:

Costs and Codes

We suggest that the cost of

  • BEGINSUB be base (2)
    • Although formally specified, the cost of BEGINSUB does not matter in practice, since BEGINSUB never executes without error.
  • JUMPSUB be high (10)
    • This is the same as JUMPI, and 2 more than JUMP.
  • RETURNSUB be low (5).

Benchmarking might be needed to tell if the costs are well-balanced.

We suggest the following opcodes:

0x5c BEGINSUB
0x5d RETURNSUB
0x5e JUMPSUB

Security Considerations

These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code needs to be modified accordingly. The JUMPSUB semantics are similar to JUMP (but jumping to a BEGINSUB), whereas the RETURNSUB instruction is different, since it can ‘land’ on any opcode (but the possible destinations can be statically inferred).

Appendix: Comparative costs.

contract fun {
    function test(uint x, uint y) public returns (uint) {
        return test_mul(2,3);
    }
    function test_mul(uint x, uint y) public returns (uint) {
        return multiply(x,y);
    }
    function multiply(uint x, uint y) public returns (uint) {
        return x * y;
    }
}

Here is solc 0.6.3 assembly code with labeled destinations.

TEST:
     jumpdest
     0x00
     RTN
     0x02
     0x03
     TEST_MUL
     jump
TEST_MUL:
     jumpdest
     0x00
     RTN
     dup4
     dup4
     MULTIPLY
     jump
RTN:
     jumpdest
     swap4
     swap3
     pop
     pop
     pop
     jump
MULTIPLY:   
     jumpdest
     mul
     swap1
     jump

solc does a good job with the multiply() function, which is a leaf. Non-leaf functions are more awkward to get out of. Calling fun.test() will cost 118 gas, plus 5 for the mul.

This is the same code written using jumpsub and returnsub. Calling fun.test() will cost 32 gas plus 5 for the mul.

TEST:
     beginsub
     0x02
     0x03
     TEST_MUL
     jumpsub
     returnsub
TEST_MUL:
     beginsub
     MULTIPLY
     jumpsub
     returnsub
MULTIPLY:
     beginsub
     mul
     returnsub

Copyright and related rights waived via CC0.