đź“– This EIP is in the review stage. It is subject to changes and feedback is appreciated.

EIP-3670: EOF - Code Validation Source

Validate EOF bytecode for correctness at the time of deployment.

AuthorAlex Beregszaszi, Andrei Maiboroda, Paweł Bylica
Discussions-Tohttps://ethereum-magicians.org/t/eip-3670-eof-code-validation/6693
StatusReview
TypeStandards Track
CategoryCore
Created2021-06-23
Requires 3540, 3860

Abstract

Introduce code validation at contract creation time for EOF formatted (EIP-3540) contracts. Reject contracts which contain truncated PUSH-data or undefined instructions. Legacy bytecode (code which is not EOF formatted) is unaffected by this change.

Motivation

Currently existing contracts require no validation of correctness and EVM implementations can decide how they handle truncated bytecode or undefined instructions. This change aims to bring code validity into consensus, so that it becomes easier to reason about bytecode. Moreover, EVM implementations may require less paths to decide which instruction is valid in the current execution context.

If it will be desired to introduce new instructions without bumping EOF version, having undefined instructions already deployed would mean such contracts potentially can be broken (since some of the instructions are changing their behaviour). Rejecting to deploy undefined instructions allows introducing new instructions with or without bumping the EOF version.

Specification

Remark: We rely on the notation of initcode, code and creation as defined by EIP-3540.

This feature is introduced on the very same block EIP-3540 is enabled, therefore every EOF1-compatible bytecode MUST be validated according to these rules.

At contract creation time both initcode and code are iterated instruction-by-instruction (the same process is used to perform “JUMPDEST-analysis”). Bytecode is deemed invalid if any of these conditions is true:

  • it contains opcodes which are not currently assigned to an instruction (for the sake of assigned instructions, we count INVALID (0xfe) as assigned),
  • the last opcode (terminating instruction) is not STOP (0x00),RETURN (0xf3), REVERT (0xfd), INVALID (0xfe) or SELFDESTRUCT (0xff).

Notice that due to the requirement of the terminating instruction, it is implicitly stated that truncated instructions (e.g. data portion of a PUSHn) are disallowed.

For a create transaction, if initcode or code is invalid, the contract creation results in an exceptional abort. Such a transaction is valid and may be included in a block.

For the CREATE and CREATE2 instructions, if initcode or code is invalid, instructions’ execution ends with the result 0 pushed on stack.

In case initcode is invalid, gas for its execution is not deducted. In case code is invalid, all creation gas is deducted, similar to exceptional abort during initcode execution.

Notice that since EOF1 disallows 0-length code section, a valid contract must contain at least a single byte, which must be a terminating instruction.

Rationale

Terminating instructions

An efficient interpreter loop would only need to rely on checking if a terminating instruction has been encountered, and if so stopping execution. Currently this is not possible in the EVM, because of the lack of requirement for a proper termination as well as allowing for truncated instructions, an interpreter must track and check these various conditions.

Possibility for deprecation

The deprecated CALLCODE (0xf2) opcode may be dropped from the valid_opcodes list to prevent use of this instruction in future. Likewise SELFDESTRUCT (0xff) could also be rejected. Yet we decided not to mix such changes in.

Reference Implementation

# The ranges below are as specified in the Yellow Paper.
# Note: range(s, e) excludes e, hence the +1
valid_opcodes = [
    *range(0x00, 0x0b + 1),
    *range(0x10, 0x1d + 1),
    0x20,
    *range(0x30, 0x3f + 1),
    *range(0x40, 0x48 + 1),
    *range(0x50, 0x5b + 1),
    *range(0x60, 0x6f + 1),
    *range(0x70, 0x7f + 1),
    *range(0x80, 0x8f + 1),
    *range(0x90, 0x9f + 1),
    *range(0xa0, 0xa4 + 1),
    # Note: 0xfe is considered assigned.
    *range(0xf0, 0xf5 + 1), 0xfa, 0xfd, 0xfe, 0xff
]

# STOP, RETURN, REVERT, INVALID, SELFDESTRUCT
terminating_opcodes = [ 0x00, 0xf3, 0xfd, 0xfe, 0xff ]

# Only for PUSH1..PUSH32
immediate_sizes = 256 * [0]
for opcode in range(0x60, 0x7f + 1):  # PUSH1..PUSH32
    immediate_sizes[opcode] = opcode - 0x60 + 1

# Fails with assertion on invalid code
def validate_code(code: bytes):
    # Note that EOF1 already asserts this with the code section requirements
    assert len(code) > 0

    opcode = 0
    pos = 0
    while pos < len(code):
        # Ensure the opcode is valid
        opcode = code[pos]
        pos += 1
        assert opcode in valid_opcodes

        # Skip immediates
        pos += immediate_sizes[opcode]

    # Ensure last opcode's immediate doesn't go over code end
    assert pos == len(code)

    # opcode is the *last opcode*
    assert opcode in terminating_opcodes

Test Cases

Contract creation

Each case should be tested for creation transaction, CREATE and CREATE2.

  • Invalid initcode
  • Valid initcode returning invalid code
  • Valid initcode returning valid code

Valid codes

  • EOF code containing INVALID
  • EOF codes ending with any of the terminating instructions
  • EOF code with data section containing bytes that are undefined instructions
  • Legacy code containing undefined instruction
  • Legacy code ending with incomplete PUSH instruction
  • Legacy code ending with any valid non-terminating instruction

Invalid codes

  • EOF code containing undefined instruction
  • EOF code ending with incomplete PUSH instruction
    • This can include PUSH instruction unreachable by execution, e.g. after STOP
  • EOF code ending with any defined instruction other than terminating ones
    • This can include codes where non-terminating instruction is not reachable, e.g. 0030 (STOP ADDRESS).

Backwards Compatibility

This change poses no risk to backwards compatibility, as it is introduced at the same time EIP-3540 is. The validation does not cover legacy bytecode (code which is not EOF formatted).

Security Considerations

The validation of initcode adds additional overhead. We think that the charge introduced by EIP-3860 is sufficient to account for the overhead introduced in this EIP.

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

Alex Beregszaszi, Andrei Maiboroda, Paweł Bylica, "EIP-3670: EOF - Code Validation," Ethereum Improvement Proposals, no. 3670, June 2021. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-3670.