Four new instructions are introduced, that allow to read EOF container’s data section: DATALOAD loads 32-byte word to stack, DATALOADN loads 32-byte word to stack where the word is addressed by a static immediate argument, DATASIZE loads data section size and DATACOPY copies a segment of data section to memory.
Motivation
Clear separation between code and data is one of the main features of EOF1. Data section may contain anything, e.g. compiler’s metadata, but to make it useful for smart contracts, EVM has to have instructions that allow to read from data section. Previously existing instructions for bytecode inspection (CODECOPY, CODESIZE etc.) are deprecated in EOF1 and cannot be used for this purpose.
The DATALOAD, DATASIZE, DATACOPY instruction pattern follows the design of existing instructions for reading other kinds of data (i.e. returndata and calldata).
DATALOADN is an optimized version of DATALOAD, where data offset to read is set at compilation time, and therefore need not be validated at run-time, which makes the instruction cheaper.
Specification
We introduce four new instructions on the same block number EIP-3540 is activated on:
If the code is legacy bytecode, all of these instructions result in an exceptional halt. (Note: This means no change to behaviour.)
If the code is valid EOF1, the following execution rules apply:
DATALOAD
Pops one value, offset, from the stack.
Reads [offset:offset+32] segment from the data section and pushes it as 32-byte value to the stack.
If offset + 32 is greater than the data section size, bytes after the end of data section are set to 0.
Deducts 4 gas.
DATALOADN
Has one immediate argument,offset, encoded as a 16-bit unsigned big-endian value.
Pops nothing from the stack.
Reads [offset:offset+32] segment from the data section and pushes it as 32-byte value to the stack.
Deducts 3 gas.
[offset:offset+32] is guaranteed to be within data bounds by code validation.
DATASIZE
Pops nothing from the stack.
Pushes the size of the data section of the active container to the stack.
Deducts 2 gas.
DATACOPY
Pops three values from the stack: mem_offset, offset, size.
Performs memory expansion to mem_offset + size and deducts memory expansion cost.
Deducts 3 + 3 * ((size + 31) // 32) gas for copying.
Reads [offset:offset+size] segment from the data section and writes it to memory starting at offset mem_offset.
If offset + size is greater than data section size, 0 bytes will be copied for bytes after the end of the data section.
Code Validation
We extend code section validation rules (as defined in EIP-3670).
Code section is invalid in case an immediate argument offset of any DATALOADN is such that offset + 32 is greater than data section size, as indicated in the container header before deployment.
RJUMP, RJUMPI and RJUMPV immediate argument value (jump destination relative offset) validation: code section is invalid in case offset points to one of two bytes directly following DATALOADN instruction.
Rationale
Zero-padding on out of bounds access
Existing instructions for reading other kinds of data implicitly pad with zeroes on out of bounds access, with the only exception of return data copying.
It is beneficial to avoid exceptional failures, because compilers can employ optimizations like removing a code that copies data, but never accesses this copy afterwards, but such optimization is possible only if instruction never has other side effects like exceptional abort.
Lack of EXTDATACOPY
EXTCODECOPY instruction is deprecated and rejected in EOF contracts and does not copy contract code when being called in legacy with an EOF contract as target. A replacement instruction EXTDATACOPY has been considered, but decided against in order to reduce the scope of changes.
Data-only contracts which previously relied on EXTCODECOPY are thereby discouraged, but if there is a strong need, support for them can be easily brought back by introducing EXTDATACOPY in a future upgrade.
Backwards Compatibility
This change poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).