EIP 2315: Simple Subroutines for the EVM Source

AuthorGreg Colvin ([email protected])
Discussions-Tohttps://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
StatusDraft
TypeStandards Track
CategoryCore
Created2019-10-17

Abstract

This proposal introduces three opcodes to support subroutines: BEGINSUB, JUMPSUB and RETURNSUB.

Motivation

The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by contriving to get the return address back to the top of stack and jumping back to it. Complex calling conventions are then needed to use the same stack for computation and control flow. Code becomes harder to read and write, and tools may need to pattern-match the conventions to identify the use of subroutines. Complex calling conventions like these can be avoided using memory, but regardless, it costs a lot of gas.

Having opcodes to directly support subroutines can eliminate this complexity and cost, just as for other machines and interpreters going back at least 60 years.

In the Appendix we show example solc output for a simple program that uses over three times as much gas just calling and returning from subroutines as comparable code using these opcodes.

Specification

BEGINSUB

Marks the entry point to a subroutine.

JUMPSUB

Jumps to the destination address on top of the stack, which must be the offset of a BEGINSUB.

RETURNSUB

Returns to the most recently executed JUMPSUB and advances to the following instruction.

A program may JUMPSUB at most 1023 times without an intervening RETURNSUB. A program which executes RETURNSUB without no prior BEGINSUB will STOP.

Rationale

This is the smallest possible change that provides native subroutines without breaking backwards compatibility.

Backwards Compatibility

These changes do not affect the semantics of existing EVM code.

Test Cases

                      data  return
offset step opcode    stack stack
0      0    PUSH1 3   []    []
1      1    JUMPSUB   [3]   [1]
2      4    STOP      []    [1]
3      2    JUMPDEST  []    [1]
4      3    RETURNSUB []    []

The above code should terminate after 4 steps with an empty stack.

                      data  return
offset step opcode    stack stack
0      0    PUSH1 2   []    []
1      1    JUMPSUB   [2]   [1]
2      2    JUMPDEST  []    [1]
3      3    RETURNSUB []    []

The above code should terminate after 4 steps with an empty stack.

                      data  return
offset step opcode    stack stack
0      0    PUSH1 3   []    []
1      1    JUMPSUB   [3]   [1]
2      8    STOP      []    []
3      2    JUMPDEST  []    [1]
4      3    PUSH1 7   []    [1]
5      4    JUMPSUB   [7]   [1,5]
6      7    RETURNSUB []    [1]
7      5    JUMPDEST  []    [1]
8      6    RETURNSUB []    [1]

The above code should terminate after 8 steps with an empty stack. The above code should terminate after 8 steps with an empty stack.

Implementations

No clients have implemented this proposal as of yet, but there are Draft PRs on the evmone and geth interpreters.

The new operators proposed here are demonstrated by the following pseudocode, which adds a return stack and cases for BEGINSUB, JUMPSUB and RETURNSUB to a simple loop-and-switch interpreter.

bytecode[code_size]
data_stack[1024] = { }
return_stack[1024] = { code_size }
PC = 0
while PC < code_size {
   opcode = bytecode[PC]
   switch opcode {
   ...
   case BEGINSUB:
   
   case JUMPSUB:
      push(return_stack, PC)
      PC = pop(data_stack)
      continue

   case RETURNSUB:
      PC = pop(return_stack)
   }
   ++PC
}

Execution of EVM bytecode begins with one value on the return stackā€”the size of the bytecode. The implicit 0 bytes at and after this offset are EVM STOP opcods. So executing a RETURNSUB with no prior JUMPSUB jumps to the code_size offset on the stack, then executes a STOP on the next cycle. A STOP or RETURN ends the execution of the program.

Costs and Codes

We suggest that the cost of BEGINSUB be base, JUMPSUB be low, and RETURNSUB be verylow. Measurement will tell. We suggest the following opcodes:

0xb2 BEGINSUB
0xb3 JUMPSUB
0xb7 RETURNSUB

Security Considerations

Program flow analysis frameworks will need to be updated to allow for a new type of branch - JUMPSUB - and new type of branching - RETURNSUB - which will cause a jump to a destination which is a JUMPSUB, not a JUMPDEST.

Appendix: Comparative costs.

contract fun {
    function test(uint x, uint y) public returns (uint) {
        return test_mul(2,3);
    }
    function test_mul(uint x, uint y) public returns (uint) {
        return multiply(x,y);
    }
    function multiply(uint x, uint y) public returns (uint) {
        return x * y;
    }
}

Here is solc 0.6.3 assembly code with labeled destinations.

   TEST:
     0x00
     RTN
     0x02
     0x03
     TEST_MUL
     jump
   TEST_MUL:
     0x00
     RTN
     dup4
     dup4
     MULTIPLY
     jump
   RTN:
     swap4
     swap3
     pop
     pop
     pop
     jump
   MULTIPLY:   
     mul
     swap1
     jump

solc does a good job with the multiply() function, which is a leaf. Non-leaf functions are more awkward to get out of. Calling fun.test() will cost 118 gas, plus 5 for the mul.

This is the same code written using jumpsub and returnsub. Calling fun.test() will cost 34 gas (plus 5).

   TEST:
     0x02
     0x03
     TEST_MUL
     jumpsub
     returnsub
   TEST_MUL:
     MULTIPLY
     jumpsub
     returnsub
   MULTIPLY:
     mul
     returnsub

Copyright and related rights waived via CC0.