I explore the technical definition of consensus proofs, review their vulnerabilities, and examine the mitigations for these vulnerabilities. I then show how these mitigations enable consensus proofs to be safely used on-chain, allowing for the first-of-its-kind byzantine fault-tolerant cross-chain bridges.
In my previous article, We explored the idea of decoupling the consensus mechanism and state machine in blockchain protocols. By doing so, we enable the existence of "consensus clients," popularly known as "light clients" that only track consensus proofs of state transitions instead of the full state transitions. This idea is not just theoretical, as the Ethereum beacon chain (consensus client) and state machine (execution client) are currently two different pieces of software.
However, tracking only consensus proofs leaves consensus clients vulnerable to sophisticated attacks if proper mitigations are not in place. To understand these attacks, it is necessary to first understand the nature of consensus proofs in distributed systems.
The mechanism by which this consensus can be enforced has been the topic of research in distributed systems for over 2 decades, research has shown that there are two requirements for reliable consensus in distributed systems: Liveness & Safety$^{[1]}$.
Liveness is a crucial property of consensus algorithms, which ensures that a distributed system can produce new state transitions continuously. Consensus algorithms are responsible for selecting a leader from a set of nodes who will produce the new state transition. This process is called "leader election."
It is worth noting that leader election protocols may not necessarily produce a single leader per round. However, such classes of protocols are referred to as "single-leader election protocols." In order to maintain blockchain functionality in a distributed system where nodes may act unpredictably, leader election mechanisms must be able to produce multiple leaders. This way, if one node is offline, other nodes can produce the required state transitions and keep the blockchain network functioning without slowdowns.
This means that the leader election mechanisms must be designed to handle various types of node failures, including hardware failures and network failures. For example, a robust leader election mechanism should be able to detect when a node has failed and remove it from the pool of potential leaders. Additionally, it should be able to recover quickly from such failures by electing a new leader to continue producing state transitions.
Safety refers to the property that distributed systems always reach consensus on a single history of events. Unlike liveness, safety requires a game-theoretic proof to support a single history of events.