Individual sections for functions with `CALLF` and `RETF` instructions
|Authors||Andrei Maiboroda (@gumb0), Alex Beregszaszi (@axic), Paweł Bylica (@chfast)|
|Requires||EIP-3540, EIP-3670, EIP-5450|
Table of Contents
- Backwards Compatibility
- Security Considerations
Introduce the ability to have several code sections in EOF-formatted (EIP-3540) bytecode, each one representing a separate subroutine/function. Two new opcodes,
RETF, are introduced to call and return from such a function. Dynamic jump instructions are disallowed.
Currently, in the EVM everything is a dynamic jump. Languages like Solidity generate most jumps in a static manner (i.e. the destination is pushed to the stack right before,
PUSHn .. JUMP). Unfortunately however this cannot be used by most EVM interpreters, because of added requirement of validation/analysis. This also restricts them from making optimisations and potentially reducing the cost of jumps.
EIP-4200 introduces static jump instructions, which remove the need for most dynamic jump use cases, but not everything can be solved with them.
This EIP aims to remove the need and disallow dynamic jumps as it offers the most important feature those are used for: calling into and returning from functions.
Furthermore, it aims to improve analysis opportunities by encoding the number of inputs and outputs for each given function, and isolating the stack of each function (i.e. a function cannot read the stack of the caller/callee).
The type section of EOF containers must adhere to following requirements:
- The section is comprised of a list of metadata where the metadata index in the type section corresponds to a code section index. Therefore, the type section size MUST be
n * 4bytes, where
nis the number of code sections.
- Each metadata item has 3 attributes: a uint8
inputs, a uint8
outputs, and a uint16
max_stack_height. Note: This implies that there is a limit of 255 stack for the input and in the output. This is further restricted to 127 stack items, because the upper bit of both the input and output bytes are reserved for future use.
max_stack_heightis further defined in EIP-5450.
- The first code section MUST have 0 inputs and 0 outputs.
Refer to EIP-3540 to see the full structure of a well-formed EOF bytecode.
A return stack is introduced, separate from the operand stack. It is a stack of items representing execution state to return to after function execution is finished. Each item is comprised of: code section index, offset in the code section (PC value), calling function stack height.
Note: Implementations are free to choose particular encoding for a stack item. In the specification below we assume that representation is three unsigned integers:
The return stack is limited to a maximum 1024 items.
Additionally, EVM keeps track of the index of currently executing section -
We introduce two new instructions:
0xe3) - call a function
0xe4) - return from a function
If the code is legacy bytecode, any of these instructions results in an exceptional halt. (Note: This means no change to behaviour.)
First we define several helper values:
caller_stack_height = return_stack.top().stack_height- stack height value saved in the top item of return stack
type[i].inputs = type_section_contents[i * 4]- number of inputs of ith section
type[i].outputs = type_section_contents[i * 4 + 1]- number of outputs of ith section
If the code is valid EOF1, the following execution rules apply:
- Has one immediate argument,
code_section_index, encoded as a 16-bit unsigned big-endian value.
- EOF validation guarantees that operand stack has at least
caller_stack_height + type[code_section_index].inputsitems.
- If operand stack size exceeds
1024 - type[code_section_index].max_stack_height(i.e. if the called function may exceed the global stack height limit), execution results in exceptional halt. This also guarantees that the stack height after the call is within the limits.
- If return stack already has
1024items, execution results in exceptional halt.
- Charges 5 gas.
- Pops nothing and pushes nothing to operand stack.
Pushes to return stack an item:
(code_section_index = current_section_index, offset = PC_post_instruction, stack_height = data_stack.height - types[code_section_index].inputs)
PC_post_instructionwe mean the PC position after the entire immediate argument of
CALLF. Operand stack height is saved as it was before function inputs were pushed.
Note: Code validation rules of EIP-5450 guarantee there is always an instruction following
CALLF(since terminating instruction or unconditional jump is required to be final one in the section), therefore
PC_post_instructionalways points to an instruction inside section bounds.
0, and execution continues in the called section.
- Does not have immediate arguments.
- EOF validation guarantees that operand stack has exactly
caller_stack_height + type[current_section_index].outputsitems.
- Charges 3 gas.
- Pops nothing and pushes nothing to operand stack.
- Pops an item from return stack and sets
PCto values from this item.
- If return stack is empty after this, execution halts with success.
In addition to container format validation rules above, we extend code section validation rules (as defined in EIP-3670).
- Code validation rules of EIP-3670 are applied to every code section.
- Code section is invalid in case an immediate argument of any
CALLFis greater than or equal to the total number of code sections.
RJUMPVimmediate argument value (jump destination relative offset) validation:
- Code section is invalid in case offset points to a position outside of section bounds.
- Code section is invalid in case offset points to one of two bytes directly following
Dynamic jump instructions
0x57) are invalid and their opcodes are undefined.
0x5b) instruction is renamed to
NOP (“no operation”) without the change in behaviour: it pops nothing and pushes nothing to operand stack and has no other effects except for
PC increment and charging 1 gas.
PC (0x58) instruction becomes invalid and its opcode is undefined.
Note: This change implies that JUMPDEST analysis is no longer required for EOF code.
- Execution starts at the first byte of the first code section, and PC is set to 0.
- Return stack is initialized to contain one item:
(code_section_index = 0, offset = 0, stack_height = 0)
- If any instruction access the operand stack item below
caller_stack_height, execution results in exceptional halt. This rule replaces the old stack underflow check.
- No change in stack overflow check: if any instruction causes the operand stack height to exceed
1024, execution results in exceptional halt.
Alternative logic for executing
RETF in the top frame could be to exceptionally halt execution, because there is arguably no caller for the starting function. This would mean that return stack is initialized as empty, and
RETF exceptionally aborts when return stack is empty.
We have decided in favor of always having at least one item in the return stack, because it allows to avoid having a special case for empty stack in the interpreter loop stack underflow check. We keep the stack underflow rule general by having
caller_stack_height = 0 in the top frame.
The number of code sections is limited to 1024. This requires 2-byte immediate for
CALLF and leaves room for increasing the limit in the future. The 256 limit (1-byte immediate) was discussed and concerns were raised that it might not be sufficient.
Instead of deprecating
JUMPDEST we repurpose it as
NOP instruction, because
JUMPDEST effectively was a “no-operation” instruction and was already used as such in various contexts. It can be useful for some off-chain tooling, e.g. benchmarking EVM implementations (performance of
NOP instruction is performance of EVM interpreter loop), as a padding to force code alignment, as a placeholder in dynamic code composition.
The purpose of
JUMPDEST analysis was to find in code the valid
JUMPDEST bytes that do not happen to be inside
PUSH immediate data. Only dynamic jump instructions (
JUMPI) required destination to be
JUMPDEST instruction. Relative static jumps (
RJUMPI) do not have this requirement and are validated once at deploy-time EOF instruction validation. Therefore, without dynamic jump instructions,
JUMPDEST analysis is not required.
This change poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).
The new execution state and multi-section control flow pose no risk to backwards compatibility, because it is a generalization of executing a single code section. Executing existing contracts (both legacy and EOF1) has no user-observable changes.
Copyright and related rights waived via CC0.
Please cite this document as:
Andrei Maiboroda (@gumb0), Alex Beregszaszi (@axic), Paweł Bylica (@chfast), "EIP-4750: EOF - Functions [DRAFT]," Ethereum Improvement Proposals, no. 4750, January 2022. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-4750.