This EIP will enable the BLAKE2b hash function and other higher-round 64-bit BLAKE2 variants to run cheaply on the EVM, allowing easier interoperability between Ethereum and Zcash as well as other Equihash-based PoW coins.
Abstract
This EIP introduces a new precompiled contract which implements the compression function F used in the BLAKE2 cryptographic hashing algorithm, for the purpose of allowing interoperability between the EVM and Zcash, as well as introducing more flexible cryptographic hash primitives to the EVM.
Motivation
Besides being a useful cryptographic hash function and SHA3 finalist, BLAKE2 allows for efficient verification of the Equihash PoW used in Zcash, making a BTC Relay - style SPV client possible on Ethereum. A single verification of an Equihash PoW verification requires 512 iterations of the hash function, making verification of Zcash block headers prohibitively expensive if a Solidity implementation of BLAKE2 is used.
BLAKE2b, the common 64-bit BLAKE2 variant, is highly optimized and faster than MD5 on modern processors.
Interoperability with Zcash could enable contracts like trustless atomic swaps between the chains, which could provide a much needed aspect of privacy to the very public Ethereum blockchain.
The precompile requires 6 inputs tightly encoded, taking exactly 213 bytes, as explained below. The encoded inputs are corresponding to the ones specified in the BLAKE2 RFC Section 3.2:
rounds - the number of rounds - 32-bit unsigned big-endian word
h - the state vector - 8 unsigned 64-bit little-endian words
m - the message block vector - 16 unsigned 64-bit little-endian words
[4 bytes for rounds][64 bytes for h][128 bytes for m][8 bytes for t_0][8 bytes for t_1][1 byte for f]
The boolean f parameter is considered as true if set to 1.
The boolean f parameter is considered as false if set to 0.
All other values yield an invalid encoding of f error.
The precompile should compute the F function as specified in the RFC and return the updated state vector h with unchanged encoding (little-endian).
Example Usage in Solidity
The precompile can be wrapped easily in Solidity to provide a more development-friendly interface to F.
Each operation will cost GFROUND * rounds gas, where GFROUND = 1. Detailed benchmarks are presented in the benchmarks appendix section.
Rationale
BLAKE2 is an excellent candidate for precompilation. BLAKE2 is heavily optimized for modern 64-bit CPUs, specifically utilizing 24 and 63-bit rotations to allow parallelism through SIMD instructions and little-endian arithmetic. These characteristics provide exceptional speed on native CPUs: 3.08 cycles per byte, or 1 gibibyte per second on an Intel i5.
In contrast, the big-endian 32 byte semantics of the EVM are not conducive to efficient implementation of BLAKE2, and thus the gas cost associated with computing the hash on the EVM is disproportionate to the true cost of computing the function natively.
An obvious implementation would be a direct BLAKE2b hash function precompile. At first glance, a BLAKE2b precompile satisfies most hashing and interoperability requirements on the EVM. Once we started digging in, however, it became clear that any BLAKE2b implementation would need specific features and internal modifications based on different projects’ requirements and libraries.
The minimal thing that is necessary for a working ZEC-ETH relay is an implementation of BLAKE2b Compression F in a precompile.
A BLAKE2b Compression Function F precompile would also suffice for the Filecoin and Handshake interop goals.
A full BLAKE2b precompile would suffice for a ZEC-ETH relay, provided that the implementation provided the parts of the BLAKE2 API that we need (personalization, maybe something else—I’m not sure).
I’m not 100% certain if a full BLAKE2b precompile would also suffice for the Filecoin and Handshake goals. It almost certainly could, provided that it supports all the API that they need.
BLAKE2s — whether the Compression Function F or the full hash — is only a nice-to-have for the purposes of a ZEC-ETH relay.
From this and other conversations with teams in the space, we believe we should focus first on the F precompile as a strictly necessary piece for interoperability projects. A BLAKE2b precompile is a nice-to-have, and we support any efforts to add one– but it’s unclear whether complete requirements and a flexible API can be found in time for Istanbul.
Implementation of only the core F compression function also allows substantial flexibility and extensibility while keeping changes at the protocol level to a minimum. This will allow functions like tree hashing, incremental hashing, and keyed, salted, and personalized hashing as well as variable length digests, none of which are currently available on the EVM.
Backwards Compatibility
There is very little risk of breaking backwards-compatibility with this EIP, the sole issue being if someone were to build a contract relying on the address at 0x09 being empty. The likelihood of this is low, and should specific instances arise, the address could be chosen to be any arbitrary value with negligible risk of collision.
Test Cases
Test vector 0
input: (empty)
output: error “input length for BLAKE2 F precompile should be exactly 213 bytes”
An initial implementation of the F function in Go, adapted from the standard library, can be found in our Golang BLAKE2 library fork. There’s also an implementation of the precompile in our fork of go-ethereum.
References
For reference, further discussion on this EIP also occurred in the following PRs and issues
Assuming ecRecover precompile is perfectly priced, we executed a set of benchmarks comparing Blake2b F compression function precompile with ecRecover precompile. For benchmarks, we used 3.1 GHz Intel Core i7 64-bit machine.
$ sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
12 rounds
An average gas price of F precompile call with 12 rounds compared to ecRecover should have been 6.74153 and it gives 0.5618 gas per round.
An average gas price of F precompile call with 1 round compared to ecRecover should have been 2.431701. However, in this scenario the call cost would totally overshadow the dynamic cost anyway.