đź“– This EIP is in the review stage. It is subject to changes and feedback is appreciated.

EIP-3540: EOF - EVM Object Format v1 Source

EOF is an extensible and versioned container format for EVM bytecode with a once-off validation at deploy time.

AuthorAlex Beregszaszi, Paweł Bylica, Andrei Maiboroda
Discussions-Tohttps://ethereum-magicians.org/t/evm-object-format-eof/5727
StatusReview
TypeStandards Track
CategoryCore
Created2021-03-16
Requires 3541, 3860

Abstract

We introduce an extensible and versioned container format for the EVM with a once-off validation at deploy time. The version described here brings the tangible benefit of code and data separation, and allows for easy introduction of a variety of changes in the future. This change relies on the reserved byte introduced by EIP-3541.

To summarise, EOF bytecode has the following layout:

magic, version, (section_kind, section_size)+, 0, <section contents>

Motivation

On-chain deployed EVM bytecode contains no pre-defined structure today. Code is typically validated in clients to the extent of JUMPDEST analysis at runtime, every single time prior to execution. This poses not only an overhead, but also a challenge for introducing new or deprecating existing features.

Validating code during the contract creation process allows code versioning without an additional version field in the account. Versioning is a useful tool for introducing or deprecating features, especially for larger changes (such as significant changes to control flow, or features like account abstraction).

The format described in this EIP introduces a simple and extensible container with a minimal set of changes required to both clients and languages, and introduces validation.

The first tangible feature it provides is separation of code and data. This separation is especially beneficial for on-chain code validators (like those utilised by layer-2 scaling tools, such as Optimism), because they can distinguish code and data (this includes deployment code and constructor arguments too). Currently, they a) require changes prior to contract deployment; b) implement a fragile method; or c) implement an expensive and restrictive jump analysis. Code and data separation can result in ease of use and significant gas savings for such use cases. Additionally, various (static) analysis tools can also benefit, though off-chain tools can already deal with existing code, so the impact is smaller.

A non-exhaustive list of proposed changes which could benefit from this format:

  • Including a JUMPDEST-table (to avoid analysis at execution time) and/or removing JUMPDESTs entirely.
  • Introducing static jumps (with relative addresses) and jump tables, and disallowing dynamic jumps at the same time.
  • Requiring code section(s) to be terminated by STOP. (Assumptions like this can provide significant speed improvements in interpreters, such as a speed-up of ~7% seen in evmone (ethereum/evmone#295).
  • Multibyte opcodes without any workarounds.
  • Representing functions as individual code sections instead of subroutines.
  • Introducing special sections for different use cases, notably Account Abstraction.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 and RFC 8174.

In order to guarantee that every EOF-formatted contract in the state is valid, we need to prevent already deployed (and not validated) contracts from being recognized as such format. This is achieved by choosing a byte sequence for the magic that doesn’t exist in any of the already deployed contracts.

Remarks

The initcode is the code executed in the context of the create transaction, CREATE, or CREATE2 instructions. The initcode returns code (via the RETURN instruction), which is inserted into the account. See section 7 (“Contract Creation”) in the Yellow Paper for more information.

The opcode 0xEF is currently an undefined instruction, therefore: It pops no stack items and pushes no stack items, and it causes an exceptional abort when executed. This means initcode or already deployed code starting with this instruction will continue to abort execution.

Code validation

We introduce code validation for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.

At block.number == HF_BLOCK new contract creation is modified:

  • if initcode or code starts with the MAGIC, it is considered to be EOF formatted and will undergo validation specified in the following sections,
  • else if code starts with 0xEF, creation continues to result in an exceptional abort (the rule introduced in EIP-3541),
  • otherwise code is considered legacy code and the following rules do not apply to it.

For a create transaction, if initcode or code is invalid, the contract creation results in an exceptional abort. Such a transaction is valid and may be included in a block.

For the CREATE and CREATE2 instructions, if initcode or code is invalid, instructions’ execution ends with the result 0 pushed on stack.

In case initcode is invalid, gas for its execution is not deducted. In case code is invalid, all creation gas is deducted, similar to exceptional abort during initcode execution.

Container specification

EOF container is a binary format with the capability of providing the EOF version number and a list of EOF sections.

The container starts with the EOF header:

description length value  
magic 2-bytes 0xEF00  
version 1-byte 0x01–0xFF EOF version number

The EOF header is followed by at least one section header. Each section header contains two fields, section_kind and section_size.

description length value  
section_kind 1-byte 0x01–0xFF Encoded as a 8-bit unsigned number.
section_size 2-bytes 0x0001–0xFFFF Encoded as a 16-bit unsigned big-endian number.

The list of section headers is terminated with the section headers terminator byte 0x00.

Container validation rules

  1. version MUST NOT be 0.[^1](#EOF-version-range-start-with-1)
  2. section_kind MUST NOT be 0. The value 0 is reserved for section headers terminator byte.
  3. section_size MUST NOT be 0. If a section is empty its section header MUST be omitted.
  4. There MUST be at least one section (and therefore section header).
  5. Section data size MUST be equal to section_size declared in its header.
  6. Stray bytes outside of sections MUST NOT be present. This includes trailing bytes after the last section.

EOF version 1

Section kinds

The section kinds for EOF version 1 are defined as follows. The list may be extended in future versions.

section_kind meaning
0 reserved for section headers terminator byte
1 code
2 data

EOF version 1 validation rules

  1. In addition to general validation rules above, EOF version 1 bytecode conforms to the rules specified below:
    • Exactly one code section MUST be present.
    • The code section MUST be the first section.
    • A single data section MAY follow the code section.
  2. Any other version is invalid.

(Remark: Contract creation code SHOULD set the section size of the data section so that the constructor arguments fit it.)

Changes to execution semantics

For clarity, the container refers to the complete account code, while code refers to the contents of the code section only.

  1. JUMPDEST-analysis is only run on the code.
  2. Execution starts at the first byte of the code, and PC is set to 0.
  3. If PC goes outside the code section bounds, execution aborts with failure.
  4. PC returns the current position within the code.
  5. JUMP/JUMPI uses an absolute offset within the code.
  6. CODECOPY/CODESIZE/EXTCODECOPY/EXTCODESIZE/EXTCODEHASH keeps operating on the entire container.
  7. The input to CREATE/CREATE2 is still the entire container.

Changes to contract creation semantics

For clarity, the EOF prefix together with a version number n is denoted as the EOFn prefix, e.g. EOF1 prefix.

  1. If initcode’s container has EOF1 prefix it must be valid EOF1 code.
  2. If code’s container has EOF1 prefix it must be valid EOF1 code.

Rationale

EVM and/or account versioning has been discussed numerous times over the past years. This proposal aims to learn from them. See “Ethereum account versioning” on the Fellowship of Ethereum Magicians Forum for a good starting point.

Execution vs. creation time validation

This specification introduces creation time validation, which means:

  • All created contracts with EOFn prefix are valid according to version n rules. This is very strong and useful property. The client can trust that the deployed code is well-formed.
  • In the future, this allows to serialize JUMPDEST map in the EOF container and eliminate the need of implicit JUMPDEST analysis required before execution.
  • Or to completely remove the need for JUMPDEST instructions.
  • This helps with deprecating EVM instructions and/or features.
  • The biggest disadvantage is that deploy-time validation of EOF code must be enabled in two hard-forks. However, the first step (EIP-3541) is already deployed in London.

The alternative is to have execution time validation for EOF. This is performed every single time a contract is executed, however clients may be able to cache validation results. This alternative approach has the following properties:

  • Because the validation is consensus-level execution step, it means the execution always requires the entire code. This makes code merkleization impractical.
  • Can be enabled via a single hard-fork.
  • Better backwards compatibility: data contracts starting with the 0xEF byte or the EOF prefix can be deployed. This is a dubious benefit, however.

Contract creation restrictions

The Changes to contact creation semantics section defines minimal set of restrictions related to the contract creation: if initcode or code has the EOF1 container prefix it must be validated. This adds two validation steps in the contract creation, any of it failing will result in contract creation failure.

Since initcode and code are evaluated for EOF1 independently, number of interesting combinations are allowed:

  • Create transaction with EOF1 initcode can deploy legacy contract,
  • EOF1 contract can execute CREATE instruction with legacy initcode to create new legacy contract,
  • Legacy contract can execute CREATE instruction with EOF1 initcode to create new EOF1 contract,
  • Legacy contract can execute CREATE instruction with EOF1 initcode to create new legacy contract,
  • etc.

To limit the number of exotic bytecode version combinations, additional restrictions are considered, but currently are not part of the specification:

  1. The EOF version of initcode must much the version of code.
  2. An EOF1 contract must not create legacy contracts.

Finally, create transaction must be allowed to contain legacy initcode and deploy legacy code because otherwise there is no transition period allowing upgrading transaction signing tools. Deprecating such transactions may be considered in the future.

The MAGIC

  1. The first byte 0xEF was chosen because it is reserved for this purpose by EIP-3541.

  2. The second byte 0x00 was chosen to avoid clashes with three contracts which were deployed on Mainnet:
    • 0xca7bf67ab492b49806e24b6e2e4ec105183caa01: EFF09f918bf09f9fa9
    • 0x897da0f23ccc5e939ec7a53032c5e80fd1a947ec: EF
    • 0x6e51d4d9be52b623a3d3a2fa8d3c5e3e01175cd0: EF
  3. No contracts starting with 0xEF bytes exist on public testnets: Goerli, Ropsten, Rinkeby, Kovan and Sepolia at their London fork block.

EOF version range start with 1

The version number 0 will never be used in EOF, so we can call legacy code EOF0. Also, implementations may use APIs where 0 version number denotes legacy code.

Section structure

We have considered different questions for the sections:

  • Streaming headers (i.e. section_header, section_data, section_header, section_data, ...) are used in some other formats (such as WebAssembly). They are handy for formats which are subject to editing (adding/removing sections). That is not a useful feature for EVM. One minor benefit applicable to our case is that they do not require a specific “header terminator”. On the other hand they seem to play worse with code chunking / merkleization, as it is better to have all section headers in a single chunk.
  • Whether to have a header terminator or to encode number_of_sections or total_size_of_headers. Both raise the question of how large of a value these fields should be able to hold. While today there will be only two sections, in case each “EVM function” would become a separate code section, a fixed 8-bit field may not be big enough. A terminator byte seems to avoid these problems.
  • Whether to encode section_size as a fixed 16-bit value or some kind of variable length field (e.g. LEB128). We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see EIP-170 and EIP-3860). Should this be limiting in the future, a new EOF version could change the format. Besides simplifying client implementations, not using LEB128 also greatly simplifies on-chain parsing.

Data-only contracts

The EOF prevents deploying contracts with arbitrary bytes (data-only contracts: their purpose is to store data not execution). EOF1 requires presence of a code section therefore the minimal overhead EOF data contract consist of a data section and one code section with single instruction. We recommend to use INVALID instruction in this case. In total there are 11 additional bytes required.

EF0001 010001 02<data-size> 00 FE <data>

PC starts with 0 at the code section

The values for PC and JUMP/JUMPI start with 0 and are within the code section. We considered keeping PC/JUMP/JUMPI values to operate on the whole container and be consistent with CODECOPY/EXTCODECOPY but in the end decided otherwise. It looks to be much easier to propose EOF extensions that affect jumps and jumpdests when JUMP/JUMPI already operates on indexes within code section only. This also feels more natural and easier to implement in EVM: the new EOF EVM should only care about traversing code and accessing other parts of the container only on special occasions (e.g. in CODECOPY instruction).

Backwards Compatibility

This is a breaking change given that any code starting with 0xEF was not deployable before (and resulted in exceptional abort if executed), but now some subset of such codes can be deployed and executed successfully.

The choice of MAGIC guarantees that none of the contracts existing on the chain are affected by the new rules.

Test Cases

EOF validation

Valid cases

  • Code section without data section
  • Code section with data section

Invalid cases

Bytecode Validation error
EF Incomplete magic
EFFF0101000302000400600000AABBCCDD Invalid magic
EF00 No version
EF000001000302000400600000AABBCCDD Invalid version
EF000201000302000400600000AABBCCDD Invalid version
EF00FF01000302000400600000AABBCCDD Invalid version
EF0001 No header
EF000100 No code section
EF000101 No code section size
EF00010100 Code section size incomplete
EF0001010003 No section terminator
EF0001010003600000 No section terminator
EF000101000200 No code section contents
EF00010100020060 Code section contents incomplete
EF000101000300600000DEADBEEF Trailing bytes after code section
EF000101000301000300600000600000 Multiple code sections
EF000101000000 Empty code section
EF000101000002000200AABB Empty code section (with non-empty data section)
EF000102000401000300AABBCCDD600000 Data section preceding code section
EF000102000400AABBCCDD Data section without code section
EF000101000202 No data section size
EF00010100020200 Data section size incomplete
EF0001010003020004 No section terminator
EF0001010003020004600000AABBCCDD No section terminator
EF000101000302000400600000 No data section contents
EF000101000302000400600000AABBCC Data section contents incomplete
EF000101000302000400600000AABBCCDDEE Trailing bytes after data section
EF000101000302000402000400600000AABBCCDDAABBCCDD Multiple data sections
EF000101000101000102000102000100FEFEAABB Multiple code and data sections
EF000101000302000000600000 Empty data section
EF0001010002030004006000AABBCCDD Unknown section (id = 3)

Contract creation

All cases should be checked for creation transaction, CREATE and CREATE2.

  • Legacy init code
    • Returns legacy code
    • Returns valid EOF1 code
    • Returns invalid EOF1 code
    • Returns 0xEF not followed by EOF1 code
  • Valid EOF1 init code
    • Returns legacy code
    • Returns valid EOF1 code
    • Returns invalid EOF1 code
    • Returns 0xEF not followed by EOF1 code
  • Invalid EOF1 init code

Contract execution

  • Valid EOF code containing JUMP/JUMPI - offsets relative to code section start are used
  • JUMP/JUMPI to 5B (JUMPDEST) byte outside of code section - exceptional abort
  • EOF code containing PC opcode - offset inside code section is returned
  • EOF code containing CODECOPY/CODESIZE - works as in legacy code
    • CODESIZE returns the size of entire container
    • CODECOPY can copy from code section
    • CODECOPY can copy from data section
    • CODECOPY can copy from the EOF header
    • CODECOPY can copy entire container
  • EXTCODECOPY/EXTCODESIZE/EXTCODEHASH with the EOF target contract - works as with legacy target contract
    • EXTCODESIZE returns the size of entire target container
    • EXTCODEHASH returns the hash of entire target container
    • EXTCODECOPY can copy from target’s code section
    • EXTCODECOPY can copy from target’s data section
    • EXTCODECOPY can copy from target’s EOF header
    • EXTCODECOPY can copy entire target container
    • Results don’t differ when executed inside legacy or EOF contract

Reference Implementation

MAGIC = b'\xEF\x00'
VERSION = 0x01
S_TERMINATOR = 0x00
S_CODE = 0x01
S_DATA = 0x02


# Determines if code is in EOF format of any version.
def is_eof(code: bytes) -> bool:
    return code.startswith(MAGIC)

class ValidationException(Exception):
    pass

# Raises ValidationException on invalid code
def validate_eof(code: bytes):
    # Check version
    if len(code) < 3 or code[2] != VERSION:
        raise ValidationException("invalid version")

    # Process section headers
    section_sizes = {S_CODE: 0, S_DATA: 0}
    pos = 3
    while True:
        # Terminator not found
        if pos >= len(code):
            raise ValidationException("no section terminator")            
        
        section_id = code[pos]
        pos += 1
        if section_id == S_TERMINATOR:
            break

        # Disallow unknown sections
        if not section_id in section_sizes:
            raise ValidationException("invalid section id")

        # Data section preceding code section
        if section_id == S_DATA and section_sizes[S_CODE] == 0:
            raise ValidationException("data section preceding code section")

        # Multiple sections with the same id
        if section_sizes[section_id] != 0:
            raise ValidationException("multiple sections with same id")

        # Truncated section size
        if (pos + 1) >= len(code):
            raise ValidationException("truncated section size")
        section_sizes[section_id] = (code[pos] << 8) | code[pos + 1]
        pos += 2

        # Empty section
        if section_sizes[section_id] == 0:
            raise ValidationException("empty section")

    # Code section cannot be absent
    if section_sizes[S_CODE] == 0:
        raise ValidationException("no code section")

    # The entire container must be scanned
    if len(code) != (pos + section_sizes[S_CODE] + section_sizes[S_DATA]):
        raise ValidationException("container size not equal to sum of section sizes")

Security Considerations

With the anticipated EOF extensions, the validation is expected to have linear computational and space complexity. We think that the validation cost is sufficiently covered by:

  • EIP-3860 for initcode,
  • high per-byte cost of deploying code.

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

Alex Beregszaszi, Paweł Bylica, Andrei Maiboroda, "EIP-3540: EOF - EVM Object Format v1 [DRAFT]," Ethereum Improvement Proposals, no. 3540, March 2021. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-3540.