⚠️ This EIP is not recommended for general use or implementation as it is likely to change.

# EIP-4750: EOF - Functions Source

### Individual sections for functions with CALLF and RETF instructions

Author Andrei Maiboroda, Alex Beregszaszi, Paweł Bylica https://ethereum-magicians.org/t/eip-4750-eof-functions/8195 Draft Standards Track Core 2022-01-10 3540, 3670

## Abstract

Introduce the ability to have several code sections in EOF-formatted (EIP-3540) bytecode, each one representing a separate subroutine/function. Two new opcodes,CALLF and RETF, are introduced to call and return from such a function.

## Motivation

Currently in the EVM everything is a dynamic jump. Languages like Solidity generate most jumps in a static manner (i.e. the destination is pushed to the stack right before, PUSHn .. JUMP). Unfortunately however this cannot be used by most EVM interpreters, because of added requirement of validation/analysis. This also restricts them from making optimisations and potentially reducing the cost of jumps.

EIP-4200 introduces static jump instructions, which remove the need for most dynamic jump use cases, but not everything can be solved with them.

This EIP aims to remove the need for dynamic jumps as it offers the most important feature those are used for: calling into and returning from functions. While it removes the need, it does not disallow those instructions.

Furthermore it aims to improve analysis opportunities by encoding the number of inputs and outputs for each given function, and isolating the stack of each function (i.e. a function cannot read the stack of the caller/callee).

## Specification

### EOF container changes

1. The requirement of EIP-3540 “Exactly one code section MUST be present.” is relaxed to “At least one code section MUST be present.”, i.e. multiple code sections (kind = 1) are allowed.
2. Total number of code sections MUST NOT exceed 1024.
3. All code sections MUST precede a data section, if data section is present.
4. New section with kind = 3 is introduced called the type section.
5. Exactly one type section MAY be present.
6. The type section, if present, MUST directly precede all code sections.
7. The type section, if present, contains a sequence of pairs of bytes: first byte in a pair encodes number of inputs, and second byte encodes number of outputs of the code section with the same index. Note: This implies that there is a limit of 256 stack for the input and in the output.
8. Therefore type section size MUST be n * 2 bytes, where n is the number of code sections.
9. First code section MUST have 0 inputs and 0 outputs.
10. Type section MAY be omitted if only a single code section is present. In that case it implicitly defines 0 inputs and 0 outputs for this code section.

To summarize, a well-formed EOF bytecode will have the following format:

bytecode := magic, version, [type_section_header], (code_section_header)+, [data_section_header], 0, [type_section_contents], (code_section_contents)+, [data_section_contents]

type_section_header := 3, number_of_code_sections * 2 # section kind and size
type_section_contents := 0, 0, code_section_1_inputs, code_section_1_outputs, code_section_2_inputs, code_section_2_outputs, ..., code_section_n_inputs, code_section_n_outputs


### New execution state in EVM

A return stack is introduced, separate from the data stack. It is a stack of items representing execution state to return to after function execution is finished. Each item is comprised of: code section index, offset in the code section (PC value), calling function stack height.

Note: Implementations are free to choose particular encoding for a stack item. In the specification below we assume that representation is three unsigned integers: code_section_index, offset, stack_height.

The return stack is limited to a maximum 1024 items.

Additionally, EVM keeps track of the index of currently executing section - current_section_index.

### New instructions

We introduce two new instructions:

1. CALLF (0x5e)
2. RETF (0x5f)

If the code is legacy bytecode, both of these instructions result in an exceptional halt. (Note: This means no change to behaviour.)

First we define several helper values:

• caller_stack_height = return_stack.top().stack_height - stack height value saved in the top item of return stack
• type[i].inputs = type_section_contents[i * 2] - number of inputs of ith section
• type[i].outputs = type_section_contents[i * 2 + 1] - number of outputs of ith section

If the code is valid EOF1, the following execution rules apply:

#### CALLF

1. Has one immediate argument,code_section_index, encoded as a 16-bit unsigned big-endian value.
2. If data stack has less than caller_stack_height + type[code_section_index].inputs, execution results in exceptional halt.
3. If return stack already has 1024 items, execution results in exceptional halt.
4. Pops nothing and pushes nothing to data stack.
5. Pushes to return stack an item:
(code_section_index = current_section_index,
offset = PC_post_instruction,
stack_height = data_stack.height - types[code_section_index].inputs)


Under PC_post_instruction we mean the PC position after the entire immediate argument of CALLF. Data stack height is saved as it was before function inputs were pushed.

Note: Code validation rules of EIP-3670 guarantee there is always an instruction following CALLF (since terminating instruction is required to be final one in the section), therefore PC_post_instruction always points to an instruction inside section bounds.

1. Sets current_section_index to code_section_index and PC to 0, and execution continues in the called section.

#### RETF

1. Does not have immediate arguments.
2. If data stack has less than caller_stack_height + types[code_section_index].outputs, execution results in exceptional halt.
3. Pops nothing and pushes nothing to data stack.
4. Pops an item from return stack and sets current_section_index and PC to values from this item.
1. If return stack is empty after this, execution halts with success.

### Code Validation

In addition to container format validation rules above, we extend code section validation rules (as defined in EIP-3670).

1. Code validation rules of EIP-3670 are applied to every code section.
2. List of allowed terminating instructions in EIP-3670 is extended to include RETF. (Note that CALLF, like other instructions with immediates, cannot be truncated.)
3. Code section is invalid in case an immediate argument of any CALLF is greater than or equal to the total number of code sections.
4. RJUMP and RJUMPI immediate argument value (jump destination relative offset) validation:
1. Code section is invalid in case offset points to a position outside of section bounds.
2. Code section is invalid in case offset points to one of two bytes directly following CALLF instruction.

### Execution

1. Execution starts at the first byte of the first code section, and PC is set to 0.
2. Return stack is initialized to contain one item: (code_section_index = 0, offset = 0, stack_height = 0)
3. Destinations of jumps are allowed only to be inside current code section. JUMP and JUMPI result in exceptional halt when destination is outside of current section bounds.
4. If any instruction would access a data stack item below caller_stack_height, execution results in exceptional halt. This rule replaces the old stack underflow check.

#### Implications on the JUMPDEST analysis

• Analysis is done separately for each section, i.e. output of entire analysis is number_of_code_sections lists of possible jump destinations.
• Analysis is extended to consider 2 bytes directly following CALLF to be invalid jump destination

## Rationale

### RETF in the top frame ends execution vs exceptionally halts

Alternative logic for executing RETF in the top frame could be to exceptionally halt execution, because there is arguably no caller for the starting function. This would mean that return stack is initialized as empty, and RETF exceptionally aborts when return stack is empty.

We have decided in favor of always having at least one item in the return stack, because it allows to avoid having a special case for empty stack in the interpreter loop stack underflow check. We keep the stack underflow rule general by having caller_stack_height = 0 in the top frame.

## Backwards Compatibility

This change poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).

The new execution state and multi-section control flow pose no risk to backwards compatibility, because it is a generalization of executing a single code section. Executing existing contracts (both legacy and EOF1) has no user-observable changes.

## Reference Implementation

MAGIC = b'\xEF\x00'
VERSION = 0x01
S_TERMINATOR = 0x00
S_CODE = 0x01
S_DATA = 0x02
S_TYPE = 0x03

# The ranges below are as specified in the Yellow Paper.
# Note: range(s, e) excludes e, hence the +1
valid_opcodes = [
*range(0x00, 0x0b + 1),
*range(0x10, 0x1d + 1),
0x20,
*range(0x30, 0x3f + 1),
*range(0x40, 0x48 + 1),
*range(0x50, 0x5d + 1),
*range(0x60, 0x6f + 1),
*range(0x70, 0x7f + 1),
*range(0x80, 0x8f + 1),
*range(0x90, 0x9f + 1),
*range(0xa0, 0xa4 + 1),
# Note: 0xfe is considered assigned.
*range(0xf0, 0xf5 + 1), *range(0xfa, 0xff + 1),
]

# STOP, RETURN, RETF, REVERT, INVALID, SELFDESTRUCT
terminating_opcodes = [ 0x00, 0xf3, 0xfc, 0xfd, 0xfe, 0xff ]

immediate_sizes = 256 * [0]
immediate_sizes[0x5c] = 2  # RJUMP
immediate_sizes[0x5d] = 2  # RJUMPI
immediate_sizes[0xfb] = 2  # CALLF
for opcode in range(0x60, 0x7f + 1):  # PUSH1..PUSH32
immediate_sizes[opcode] = opcode - 0x60 + 1

class ValidationException(Exception):
pass

# Validate EOF code.
# Raises ValidationException on invalid code
def validate_eof(code: bytes):
# Check version
if len(code) < 3 or code[2] != VERSION:
raise ValidationException("invalid version")

section_sizes = {S_TYPE: [], S_CODE: [], S_DATA: []}
pos = 3
while True:
if pos >= len(code):
raise ValidationException("no section terminator")

section_id = code[pos]
pos += 1
if section_id == S_TERMINATOR:
break

# Disallow unknown sections
if not section_id in section_sizes:
raise ValidationException("invalid section id")

# Data section preceding code section (i.e. code section following data section)
if section_id == S_CODE and len(section_sizes[S_DATA]) != 0:
raise ValidationException("data section preceding code section")

# Code section or data section preceding type section
if section_id == S_TYPE and (len(section_sizes[S_CODE]) != 0 or len(section_sizes[S_DATA]) != 0):
raise ValidationException("code or data section preceding type section")

# Multiple type or data sections
if section_id == S_TYPE and len(section_sizes[S_TYPE]) != 0:
raise ValidationException("multiple type sections")
if section_id == S_DATA and len(section_sizes[S_DATA]) != 0:
raise ValidationException("multiple data sections")

# Truncated section size
if (pos + 1) >= len(code):
raise ValidationException("truncated section size")
section_sizes[section_id].append((code[pos] << 8) | code[pos + 1])
pos += 2

# Empty section
if section_sizes[section_id][-1] == 0:
raise ValidationException("empty section")

# Code section cannot be absent
if len(section_sizes[S_CODE]) == 0:
raise ValidationException("no code section")

# Not more than 1024 code sections
if len(section_sizes[S_CODE]) > 1024:
raise ValidationException("more than 1024 code sections")

# Type section can be absent only if single code section is present
if len(section_sizes[S_TYPE]) == 0 and len(section_sizes[S_CODE]) != 1:
raise ValidationException("no obligatory type section")

# Type section, if present, has size corresponding to number of code sections
if len(section_sizes[S_TYPE]) != 0 and section_sizes[S_TYPE][0] != len(section_sizes[S_CODE]) * 2:
raise ValidationException("invalid type section size")

# The entire container must be scanned
if len(code) != (pos + sum(section_sizes[S_TYPE]) + sum(section_sizes[S_CODE]) + sum(section_sizes[S_DATA])):
raise ValidationException("container size not equal to sum of section sizes")

# First type section, if present, has 0 inputs and 0 outputs
if len(section_sizes[S_TYPE]) > 0 and (code[pos] != 0 or code[pos + 1] != 0):
raise ValidationException("invalid type of section 0")

# Raises ValidationException on invalid code
def validate_code_section(code: bytes, num_code_sections: int):
# Note that EOF1 already asserts this with the code section requirements
assert len(code) > 0

opcode = 0
pos = 0
rjumpdests = set()
immediates = set()
while pos < len(code):
# Ensure the opcode is valid
opcode = code[pos]
pos += 1
if not opcode in valid_opcodes:
raise ValidationException("undefined instruction")

if opcode == 0x5c or opcode == 0x5d:
if pos + 2 > len(code):
raise ValidationException("truncated relative jump offset")
offset = int.from_bytes(code[pos:pos+2], byteorder = "big", signed = True)

rjumpdest = pos + 2 + offset
if rjumpdest < 0 or rjumpdest >= len(code):
raise ValidationException("relative jump destination out of bounds")

elif opcode == 0xfb:
if pos + 2 > len(code):
raise ValidationException("truncated CALLF immediate")
section_id = int.from_bytes(code[pos:pos+2], byteorder = "big", signed = False)

if section_id >= num_code_sections:
raise ValidationException("invalid section id")

# Save immediate value positions
immediates.update(range(pos, pos + immediate_sizes[opcode]))
# Skip immediates
pos += immediate_sizes[opcode]

# Ensure last opcode's immediate doesn't go over code end
if pos != len(code):
raise ValidationException("truncated immediate")

# opcode is the *last opcode*
if not opcode in terminating_opcodes:
raise ValidationException("no terminating instruction")

# Ensure relative jump destinations don't target immediates
if not rjumpdests.isdisjoint(immediates):
raise ValidationException("relative jump destination targets immediate")


## Security Considerations

TBA

Copyright and related rights waived via CC0.