Add a four step block rule to Clique that should reduce block production deadlocks
Abstract
The current specification of Clique allows for multiple competing blocks from producers but does not
provide any strategies to pick blocks aside from the current “highest total difficulty” rule. This
EIP proposes a four step choice rule of highest total difficulty, shortest chain, most recently
in-turn, and lowest hash. This would prevent deadlocks that have occurred in production systems.
Motivation
There has been more than one deadlock in the Goerli multi-client Clique network. The number of
active validators was greater than 1/2 of the available validators so a chain halt should not have
occurred. The halt was resolved by an inactive validator coming back on line. The state of the chain
was in one of two configurations of 8 validators that can result in a chain halt. Three of the four
clients observed a choice sequence of lowest total difficulty followed by first observed block. Geth
added one extra rule of preferring the shortest chain before preferring the first observed block.
This fork would have resolved itself with Geth’s rule, but there is still a configuration where the
chain can halt with a shortest chain rule.
Specification
When a Clique validator is arbitrating the canonical status between two different chain head blocks,
they should choose the canonical block with the following ordered priorities.
Choose the block with the most total difficulty.
Then choose the block with the lowest block number.
Then choose the block whose validator had the least recent in-turn block assignment.
Then choose the block with the lowest hash.
When resolving rule 3 clients should use the following formula, where validator_index is the integer
index of the validator that signed the block when sorted as per epoch checkpointing,
header_number is the number of the header, and validator_count is the count of the current
validators. Clients should choose the block with the largest value. Note that an in-turn block
is considered to be the most recent in-turn block.
When resolving rule 4 the hash should be converted into an unsigned 256 bit integer.
Rationale
Two scenarios of a halted chain are known based on the current total difficulty then first observed
rule. One of the scenarios is also resistant to the shortest chain rule.
For the first scenario where chains of different lengths can halt consider a block with 8
validators, whose addresses sort to the same order as their designation in this example. A fully
in-order chain exists and validator number 8 has just produced an in-turn block and then validators
5, 7 and 8 go offline, leaving validators 1 to 6 to produce blocks. Two forks form, one with an
in-order block from validator 1 and then an out of order block from validator 3. The second fork
forms from validators 2, 4, and 6 in order. Both have a net total difficulty of 3 more than the
common ancestor. So in this case if both forks become aware of the other fork then both are
considered equally viable and neither set of validators should switch to the newly observed fork. In
this case, adding a shortest chain rule would break the deadlock as the even numbered validators
would adopt the shorter chain.
For the second scenario with the same validator set and in-order chain with validator 7 having just
produced an in order block, then validators 7 and 8 go offline. Two forks form, 1,3,5 on one side
and 2,4,6 on the other. Both forks become aware of the other fork after producing their third block.
In this case both forks have equal total difficulty and equal length. So Geth’s rule would not break
the tie and only the arrival of one of the missing validators fix the chain. In a worst case
scenario the odd and even chains would produce a block for 7 and 8 respectively, and chain halt
would result with no validators that have not chosen a fork. Only a manual rollback would fix this.
One consideration when formulating the rules is that the block choice should be chosen so that it
would encourage the maximum amount of in-order blocks. Selecting a chain based on shortest chain
implicitly prefers the chain with more in-order blocks. When selecting between competing out of
order chains the validator who is closest to producing an in-order block in the future should have
their chain declined so that they are available to produce an in-order block sooner.
At least one client has been observed producing multiple blocks at the same height with the same
difficulty, so a final catch-all standard of lowest block hash should break any remaining ties.
Backwards Compatibility
The current block choice rules are a mix of most total difficulty and most total difficulty plus
shortest chain.
As long as the majority of the active validators implement the block choice rules then a client who
only implements the existing difficulty based rule will eventually align with the chain preferred by
these rules. If less than a majority implement these rules then deadlocks can still occur, and
depend on the first observation of problematic blocks, which is no worse than the current situation.
If clients only partially implement the rule as long as every higher ranked rule is also implemented
then the situation will be no worse than today.
Security Considerations
Malicious and motivated attackers who are participating in the network can force the chain to halt
with well crafted block production. With a fully deterministic choice rule the opportunity to halt
is diminished. Attackers still have the same opportunities to flood the network with multiple blocks
at the same height. A deterministic rule based on the lowest hash reduces the impact of such a
flooding attack. A malicious validator could exploit this deterministic rule to produce a
replacement block. Such an attack exists in current implementations but a deterministic hash rule
makes such replacements more likely. However the impact of such an attack seems low to trivial at
this time.