Demystifying Byzantine Fault Tolerance & Consensus Mechanisms — Part 1

Yogeshwar Tanwar
4 min readMar 23, 2020

--

What is consensus mechanism in context of distributed computing / blockchains.

The basic issue in distributed computing and blockchains systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires processes to agree on some data value that is needed during computation

These processes are described as consensus.

  • In case of malicious actor , what happens when it decided to tamper with state of records & ledger ?
  • What will happen if these actors are in majority & large part of the network ?

A secure consensus protocol must be fault tolerant.

In this article first we will talk about Two Generals Problem. After that Byzantine Generals Problem & then discuss about Byzantine Fault Tolerance in distributed and blockchain systems

Two generals Problem

Two generals Problem

In this scenario there are two generals attacking a common enemy. General A is considered the leader and other one is the follower. Both generals need to coordinate & attack simultaneously to defeat the enemy. This seems like a simple scenario, but there is catch here:

For both of them to be in sync & decide on a time, General A has to send a messenger who will deliver time of the attack to General B. However, there is a chance , the messenger will be captured by enemies and thus the message won’t be delivered. This results in General A attacking while General B’s army holding back.

Even in case of first message goes through, General B has to acknowledge the receipt of message, so General B sends messenger back, thus repeating the s scenario where the messenger can be captured. This extends to infinite ACK’s and thus the generals are unable to reach an agreement.

  • Since the probability of the message not getting through is always > 0, the generals can never agree with 100% accuracy.
  • It can be proved that there is no way of solving the two generals problem thus there is no perfect solution.
  • In this situation, the network is our byzantine opponent.
  • The Two General’s Problem demonstrates that if your byzantine failure results in the failure of your entire communication network, there is no way that you can get consensus between your nodes in your distributed system.

The Byzantine Generals Problem

Byzantine Generals Problem
  • A Byzantine army is trying to attack an enemy
  • There are five generals who are leading armies to attack a fortress
  • They need to decide what they are going to do tomorrow morning: attack or retreat
  • The votes that generals come up with are Attack, Retreat, Retreat, Attack, Attack
  • Majority rules, so they see that they will attack, as a consensus is reached
  • What if one of the generals is a traitor → that traitor’s mission is to disrupt consensus such that the generals do not agree on what they intend to do tomorrow morning
  • What the traitor decides is negated, however what needs to be considered is what the other generals think the traitor said
  • In this case, they currently all think the traitor said attack (or retreat)
  • It would be bad if half of loyal generals thought that the traitor said attack and half thought the traitor said retreat
  • If that were to happen, the traitor would be happy because his mission of creating chaos and corrupting consensus would be achieved

TAKE AWAYS

The Byzantine Generals Problem in particular is the problem of communicating one decision from one general to all the other generals.

We are simply trying to get the loyal generals to agree, come to consensus on a single fact.

Theorem: For any n, Algorithm O(n) reaches consensus if there are more than 3n generals and at most n traitors.

This means that the algorithm can reach consensus as long as 2/3 of the actors are honest. If the traitors are more than 1/3, consensus is not reached, the armies do not coordinate their attack and the enemy wins.

Byzantine Fault Tolerance

Byzantine Fault Tolerance is characteristic of a distributed computer network to function as desired and correctly reach a sufficient consensus despite malicious actors of the system failing or propagating incorrect information to other peers.

In other words, Byzantine fault tolerance (BFT) is the property of a system that is able to resist the class of failures derived from the Byzantine Generals Problem. This means that a BFT system is able to continue operating even if some of the nodes fail or act maliciously.

Summary

In this article, we summarised basic concepts of consensus in distributed computing.

In Part 2 , we will compare & discuss algorithms which are used in distributed systems in order to achieve BFT.

References

This article is a short summary of the research paper

The Byzantine Generals Problem by LESLIE LAMPORT, ROBERT SHOSTAK, and MARSHALL PEASE

--

--