Alert Source Discuss
⚠️ Draft Standards Track: Core

EIP-7748: State conversion to Verkle Tree

Describes a state conversion procedure to migrate key-values from the Merkle Patricia Tree to the Verkle Tree.

Authors Guillaume Ballet (@gballet), Ignacio Hagopian (@jsign), Gajinder Singh (@g11tech), Ansgar Dietrichs (@adietrichs), Gottfried Herold (@GottfriedHerold), Jamie Lokier (@jlokier), Tanishq Jasoria (@tanishqjasoria), Parithosh Jayanthi (@parithosh), Gabriel Rocheleau (@gabrocheleau), Karim Taam (@matkt)
Created 2024-07-23
Discussion Link https://ethereum-magicians.org/t/eip-7748-state-conversion-to-verkle-tree/20625
Requires EIP-7612

Abstract

This EIP proposes a procedure to convert, on each block, a fixed number of key-values from the existing Merkle Patricia Tree (MPT) to the Verkle Tree (VKT).

Motivation

The accounts state is too large to wait for transactions to organically move all of them to the VKT through the Overlay Tree. Thus, we need a strategy to convert all the state within a reasonable time. The state conversion completion allows removing the Overlay Tree abstraction introduced in EIP-7612 and use directly the VKT for all state access.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 and RFC 8174.

Constants

Parameter value Description
CONVERSION_START_TIMESTAMP TBD Timestamp at which the conversion starts.
CONVERSION_STRIDE TBD Maximum number of conversion units to be converted per block

A conversion unit is:

  • A contract storage slot.
  • A contract code. (i.e. all the code is a single conversion unit)
  • An account data. (e.g. balance, nonce, code-hash)

Changes to the execution spec

Include the following code in the existing apply_body(...) function:

def apply_body(state: State, ...) -> Tuple[Uint, Root, Root, Bloom, State, Root]:
    ...
    # <new_code>
    if block_time >= CONVERSION_START_TIMESTAMP and not state._conversion_finished:
        block_state_conversion(state, CONVERSION_STRIDE)
    # </new_code>
    
    for i, tx in enumerate(map(decode_transaction, transactions)):
        ...
    ...

Before executing txs, it calls block_state_conversion(...) (described below) which performs a state conversion step for this block.

In state.py, add the following code:

@dataclass
class StoragePhase:
    """
    The account next conversion step continues converting the 
    storage-slot with key greater or equal next_key. 
    If there isn't such storage-slot, the account must move to
    AccountDataPhase.
    """
    next_key: Bytes   
   
@dataclass
class AccountDataPhase:
    """
    The account next conversion step continues migrating the account 
    code (if any) and basic data. After processing, the account must 
    move to the next account in the trie (or finish if it was the 
    last one).
    """
    pass

@dataclass
class CurrentConvertingAccount:
    """
    Contains the state conversion next step.
    """
    address: Address
    phase : StoragePhase | AccountDataPhase

These new structures allows State to track where we’re in the conversion process.

Modify the State class by adding the following attributes:

@dataclass
class State:
    # <new_code>
    _conversion_curr_account: Optional[CurrentConvertingAccount] = None
    _conversion_finished: bool = False
    # </new_code>
    ...
    

Define a function with the following signature:

def trie_get_next_at_key(trie: Trie[K, V], key_seek: Bytes) -> (K, V, Optional[Bytes]):
    # Returns the first (key, value) in the trie-key is >= key_seek.
    # This method must only be used on Tries with secured=True,
    # since key_seek is the keccak256(K).
    # 
    # Returns:
    # - K, V: the key and value (e.g: Address/Value, StorageSlot/Value)
    # - next_key: The smallest trie-key present in the trie greater 
    # than key_seek, or None if there isn't one.
    #
    # Is up to the implementator to decide the best implementation
    # considering its client architecture.

Add or modify the following functions:

# New function.
def get_conversion_account(state: State) -> CurrentConvertingAccount:
    # When starting the conversion, initialize with the first account
    # in the MPT.
    if state._conversion_curr_account is None:
        # Initialize with the first account in the account trie.
        first_account = trie_get_next_at_key("0x0")
        # Accounts conversion starts with storage-slots conversion.
        phase = StoragePhase("0x0") # Starts with the lowest storage-slot key.
        state._conversion_curr_account = CurrentConvertingAccount(first_account, phase)
    
    return state._conversion_curr_account

# New function.
def conversion_move_to_next_account(state: State):
    curr_account = state.get_conversion_account()
    address, _, next_key = trie_get_next_at_key(state._main_trie, curr_account.phase.next_key)
    if next_key is None:
        # We finished the conversion
        state._conversion_finished = True
    else:
        # Move to the next account
        state._conversion_curr_account.address = address
        state._conversion_curr_account.phase = StoragePhase("0x00")

# Modified function: add new only_if_empty optional parameter.
def set_storage(
    state: State, addr: Address, key: Bytes, value: U256, only_if_empty: bool = True
) -> None:
    # <new_code>
    if only_if_empty:
        value = state._overlay_tree.get(get_tree_key_for_storage_slot(addr, key))
        if value is not None:
            return
    # </new_code>
    
    state._overlay_tree.set(get_tree_key_for_storage_slot(addr, key), value)

As mentioned previously, the next function is called by apply_body(...) to perform the conversion step for a block:

# Note the following function is optimized for readability, not for performance.
def state_convert(state: State, stride: int):
    n = 0    
    while n < stride and not state._conversion_finished:
        curr_account = state.get_conversion_account()
        
        # EIP-161 should not be converted.
        if account_exists_and_is_empty(state, curr_account.address):
            state.conversion_move_to_next_account()
            continue
        
        # Account storage.
        if curr_account.phase is StoragePhase:
            # Get the storage-slot from _storage_tries which is MPT data.
            trie = state._storage_tries.get(curr_account.address)
            
            if trie is not None:
                slot_num, slot_value, next_key = trie_get_next_at_key(trie, curr_account.phase.next_key)
                # The Overlay Tree will write in the VKT. We use the new
                # only_if_empty parameter to avoid writing stale values.
                set_storage(state, curr_account.address, slot_num, slot_value, only_if_empty=True)
                n += 1
    
                if next_key is not None:
                    # There're more storage-slots to be converted, continue in this phase.
                    state.conversion_curr_account.phase.next_key = next_key
                else:
                    # No more storage-slots. Move to the account data migration.
                    state.conversion_curr_account.phase = AccountDataPhase()
            else:
                # There's no storage trie for the account, move directly to
                # migrating code (if any).
                state.conversion_curr_account.phase = AccountDataPhase()
        # Account code and basic data.
        else:
            # Getting the code from the Overlay Tree is fine since promises returning
            # the Account full code which would come from the MPT or a separate code database.
            account = get_account(state, curr_account.address)
            chunked_code = chunkify_code(account.code)
            
            for chunk_num in range(len(chunked_code)):
                state_set_codechunk(state, address, chunk_num, chunked_code[chunk_num])
                n += 1
                
            # If the account data (i.e: nonce, balance, code-size, code-hash) lives in MPT, 
            # get_account will pull from MPT and then we write to the VKT. If the account 
            # data already lives in the VKT (i.e: it was indirectly converted by a tx), then 
            # it will return it from the VKT and write it again (i.e: it's a noop).
            # Thus, this operation is correct under both scenarios. That is, it won't
            # write stale data.
            account = get_account(state, curr_account.address)
            set_account(state, curr_account.address, account)
            n += 1
    
            state.conversion_move_to_next_account()

Rationale

State conversion step position in block execution

Performing the conversion step before the block txs execution has some benefits:

  • If the state conversion step is done after txs execution, there’s a possibility that txs execution writes overlap with converted key-values, having to care about them becoming stale in the same block. With the proposed ordering, they can only become stale by writes of previous blocks.
  • It can reduce the complexity of optimizations, such as frontrunning the state conversion for the next block before it arrives.

CONVERSION_STRIDE proposed value

Performance benchmarks were done to achieve the right balance between:

  • Don’t overload the clients with too much extra work per block.
  • Don’t create an unmanageable load in clients during feasible long reorgs.
  • Finish the conversion as fast as possible.

Account code chunking done in a single step

If an account has code, this is chunked and inserted in the VKT in one go. An alternative is including a CodePhase and let each inserted chunk consume one unit of CONVERSION_STRIDE.

We decided to not do this to reduce the algorithm complexity. Considering the current maximum code size, the wost case scenario for a block could overflow the CONVERSION_STRIDE limit by 24k/31~=793 units.

Expected time for the conversion to finish

TODO: We have an estimation, but it might be worth recalculating it closer to the proposed fork having a more up to date state size estimate.

Missed slots

The conversion logic runs at the start of each block, so missed slots don’t create special situations.

Accounts storage->account-data order

The proposed order synergizes with many EL client flat-db architectures, minimizing random disk-IO.

Not counting EIP-161 accounts for CONVERSION_STRIDE limit

The CONVERSION_STRIDE parameter tries to limit the load of effective writes. These special accounts are skipped since we try to perform a bulk EIP-158 deletion of the remaining accounts.

This might sound dangerous since if there were 1k of these accounts and all corresponded to be converted in the same block, we’d be forcing the clients to iterate 1k accounts without counting any quota from CONVERSION_STRIDE. The number of remaining accounts to delete is very low (i.e.: dozen) and also not contiguous, so this shouldn’t be a concern.

MPT preimage resolving

EL clients are expected to satisfy at least one of these conditions:

  • They have a proper flat-db design, which doesn’t require preimage resolving.
  • They have a full preimage database which can resolve trie_key->preimage (but this can have poor performance).
  • They have downloaded the preimage database image that will be distributed before the conversion starts.

Backwards Compatibility

No backward compatibility issues found.

Test Cases

TODO: currently described in an external document.

Reference Implementation

  • transition-post-genesis branch in github.com/gballet/go-ethereum implements this when setting --override.overlay-stride to a non-zero value on the command line.

Security Considerations

Needs discussion.

Copyright and related rights waived via CC0.

Citation

Please cite this document as:

Guillaume Ballet (@gballet), Ignacio Hagopian (@jsign), Gajinder Singh (@g11tech), Ansgar Dietrichs (@adietrichs), Gottfried Herold (@GottfriedHerold), Jamie Lokier (@jlokier), Tanishq Jasoria (@tanishqjasoria), Parithosh Jayanthi (@parithosh), Gabriel Rocheleau (@gabrocheleau), Karim Taam (@matkt), "EIP-7748: State conversion to Verkle Tree [DRAFT]," Ethereum Improvement Proposals, no. 7748, July 2024. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-7748.