Introduce a new separate metadata section to the Ethereum Object Format (EOF) that is unreachable by the code, and any changes to which does not affect the code.
Motivation
It is desirable to include metadata in contract’s bytecode for various reasons. For instance, both the Solidity and Vyper compilers by default include the language and compiler version used to compile. Vyper (with 0.4.1) appends an integrity hash to the initcode in CBOR encoding. Solidity additionally includes the IPFS or the Swarm hash of the Solidity contract metadata.json file, and the experimental Solidity flag. The current (pre-EOF) practice is to append this CBOR encoded metadata section in the contract’s runtime bytecode, followed by the 2 bytes length of the CBOR encoded bytes.
This poses a problem for source code verification where the onchain bytecode is compared to the compiled bytecode of the given source code. During a contract verification, metadata sections, in particular the IPFS hash, need to be ignored and only the executional bytecode should be compared. Since pre-EOF bytecode is not structured, it is not possible to distinguish the metadata section from the executional bytecode easily. This gets even trickier in the case of factory contracts with multiple nested bytecodes, each having their own metadata sections. Verifiers need to implement their own heuristics and workarounds to find the metadata sections and ignore it.
The EOF brings structure to the bytecode by separating the code from the data, and placing the code of each contract in their respective containers. In its current form, this makes it possible to find the data easier than the pre-EOF bytecode. However, the current spec also does not describe a metadata section. Compilers currently need to place the contract metadata inside the data section which poses several problems:
It is not straightforward to distinguish the metadata part in the data_section, which poses the same problem as the pre-EOF bytecode.
Any change to the metadata’s size within the data section will change the executional bytecode, e.g. through shifting DATALOADN offsets. With that, two identical contracts with different metadata sizes will not match during source code, since the code will be different.
The metadata can theoretically be reached by the code, e.g. via manipulating the DATALOADN instructions.
Specification
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 and RFC 8174.
Extending the format introduced in EIP-3540, this EIP proposes to add a new OPTIONAL section in the body called metadata_section before the data_section, and to add two new OPTIONAL fields kind_metadata (value: 0x05) and metadata_size to the header before the kind_data and data_size fields.
16-bit unsigned big-endian integer denoting the length of the metadata section content
kind_data
1 byte
0x04
kind marker for data size section
data_size
2 bytes
0x0000-0xFFFF
16-bit unsigned big-endian integer denoting the length of the data section content (*)
terminator
1 byte
0x00
marks the end of the header
Body
name
length
value
description
…
…
…
…
metadata_section
variable
n/a
arbitrary sequence of bytes
data_section
variable
n/a
arbitrary sequence of bytes
The strucure and the encoding of the metadata_section is not defined by this EIP. It is left to the compilers, tooling, or the contract developers to define the encoding and the content. The current practice by the Solidity and Vyper compilers is to use CBOR encoding.
Rationale
The metadata_section in the body, as well as the kind_metadata and metadata_size fields in the header, are OPTIONAL. This way, the compilers can avoid additional bytes in the container if they don’t want to write any metadata. The data_section can change in its size and content during deployment, therefore it needs to be REQUIRED, even if the data is empty. The metadata_section is not expected to change during the deployment.
The reason for placing the metadata_section before the data_section, and assigning kind_metadata the value 0x05 (and not 0x04) is to make it easier for the existing EOF tooling adapt the changes. Additionally, if the metadata_section was placed after the data_section, changes to the data_section in deploy time would cause the metadata_section to shift. By placing the metadata_section before, this could be mitigated.
Backwards Compatibility
No backward compatibility issues are expected since EIP-3540 is not implemented yet.
Security Considerations
No security considerations as this section is meant not to be executed.