April 28, 2018 Written by

Blockchain technologies have recently proliferated as a future emerging technology. A key feature of blockchains is the ability to let a group of nodes agree upon the next block. The task for the agreement is known as consensus. In this article, we discuss two fundamental consensus protocols, namely Paxos and Zyzzyva.

 


1 Introduction

In modern network systems, multiple participants often have to harmonize with each other, to find a common conclusion or make a global valid decision. In most cases this works fine, by just sending messages in between the participants. But what happens, if one ore more participants in the network are acting against the common sense and with this behaviour try to jeopardize the harmony or the common conclusion? For this kind of networks, so called consensus protocols exists. A Consensus protocol defines a method, that is able to get to a valid common conclusion, even if some participants in the network try to avoid it. In the following document, two of these consensus protocols will be presented and compared. Both do work in different ways, with differentiating tricks, and therefore they have different assets and drawbacks, which we want to have a closer look at.

2 Global prerequisits

To understand how consensus protocols work, a basic agreement about the wording needs to be set. In this case the basic definitions are as the following.

Definition 1. (node) A network consists of different nodes. A node is an independent working entity that can forward messages as calculation results based on any input values. The total number of nodes in the network is defined as n.

Definition 2. (message passing) the nodes in a network can communicate with each other by sending messages to each other. The message can contain any content, depending on the requirement of the protocol.

Definition 3. (message loss) the messages that are interchanges, can get lost. The message loss cannot be forecasted and happens completly random.

Of course, in worst case these definitions can also cause a completely broken network by saying, that any message gets delivered. But this should not be the usual case. Basically this should define, that most of the communication of the nodes is going to work, but sometimes messages just drop.

3 Paxos

Paxos is one of the most famous consensus protocols, since Leslie Lamport presented it in 1998 in the part-time parliament (https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf). It’s containing the main features of modern network architecture and gives the basic workwise for the current IP protocols. Also Paxos is one of the first consensus protocols, that tried to prevent exclusive locks for some clients, so the system won’t stop working in case of a client fallout.

3.1 Workwise

So how does this magic trick work? Basically Paxos is a three phase protocol, that is working with tickets instead of locks. The main intention on paxos is based on the idea, that one node want to write some data, that is probably based on previous data, to a kind of database. To realize this, paxos splits the nodes in the network into two groups with differential roles.

Definition 4. (server) The group of servers is storing the data and persisting the information.

Definition 5. (client) A client is allways a node in the network, that did some kind of calculation and wants to persist its information in the database of the servers.

The paxos protocol starts with the intention of a client, to write some data the the database. To this, the protocol has three phases. In the first phase, the client needs to receive a ticket from all servers. This ticket is used in the second phase, to transfer the command that should be executed to all servers. The third phase is finally used, to execute the stored command on the servers.

In the first step of the protocol, the client has to ask all servers for a ticket t to use. The ticket in this case is nothing more than a number. It’s not having any timeout or additional information. The ticket number itself is defined on the client side, but should only increase in the smallest possible value. This ticket will be used, to store and execute a command in further steps. So for this, the client needs to have a majority of the servers he asked to confirm the ticket number as valid. A communication flow is shown in the next two images.


(a) Ticket Request
(b) Confirmation
 
 
Figure 1: Step one - Ticket Communication

 


In Figure 1, a sample flow is shown, on which the client gets a response from two of the three servers, what is a majority. In conclusion he can use the ticket, he requested. If two or more of the servers would have declined the request, the client has had to request with the next higher ticket. (eg. t=9). As a result of this first step, the client now has a ticket, that he can use to store and execute his command.

The second step is used, for the client to transfer its command c to all servers, but not execute it. This is used as an additional step, in preparement of the execution of the command itself. For the transfer of the command, the client has to pass over the ticket he requested again, so the server knows about the validity of the storing of the command. If the ticket, which the client uses to store its command, is still valid (still the highest requested ticket), the server stores the command and responds with an OK. If the majority of the servers, does not respond with a positive result, the client has to start with phase one again.


(a) Command Proposal
(b) Confirmation
 
 
Figure 2: Step two - Command Transfer

 


In the last step of this procotol, the client calls all servers to execute the command which was just stored before. So in that moment, the majority of servers executed the command and the data is up to date.

In case of a server crash, or partial data loss, the server would not write the data and get into an invalid state. This situation will be solved on the next ticket request. In the next ticket Request the client receives a higher ticket no, from all valid servers, than from the one with the invalid state. In conclusion, the server receives the next write command with a ticket no, that is higher than his current state. From this state he can begin to sync his own state with the other servers by simply downloading the previous transactions.

3.2 Benefits

This three phase paxos protocol has two large benefits. The first benefit is, that the protocol is very fault tolerant. This means, that even if a few servers crash at no time, the client can still transfer and execute its command. As long as the majority of the servers is available. The second benefit is, that even if a client crashes, there are no locks that need to be broken. In a case like that, the client just needs to take a new ticket and the protocol continues as if nothing has happened.

3.3 Challenges

Of course this protocol has a high fault tolerance, but on the other hand it has two major restrictions. The first limitation is based on the power of a client. If one of the clients wants to bother the network, it just needs to request ticket by ticket and so nobody can write at anymore. This means the network can only work on trusted environments without any byzantine influences (in this pure implementation).

The second restriction is based on the third step of the protocol. The pure implementation just offers the command ”execute last stored command”, what of course seems to work, if only one client is interacting in the network. But if multiple clients are in the network and another one requests a new ticket, and stores his command in the same amount of servers, while the first client may just needs a little longer to send the ”execute” command, the execution command of the first client, might executes the just updated command of the second client.

4 ZYZZYVA

The Zyzzyva protocol is also a three phase protocol, which was presented in the CACM a few years later, but it works completely different (http://www.cs.utexas.edu/users/dahlin/papers/Zyzzyva-CACM.pdf). With Zyzzyva, the client also wants to store data in the servers database, but it does not access them directly. Instead it communicates only with one primary server, as long as everything works fine.

4.1 Workwise

The Zyzzyva protocol involves the same layers as the paxos protocol (servers and clients) but it adds an additional separation into the servers. The servers in Zyzzyva are divided into two roles again.

Definition 6. (primary server) One of the servers that are used to store the information is called the primary one. The primary is the server, that receives the commands from the clients and orders them. That’s why he can also be seen as a serializer.

Definition 7. (replica server) The replica server, are all servers, that are not the primary one. The replica servers are getting the commands they have to execute from the primary server.

This differentiation of servers, results in a different three phase protocol. In the first phase of the protocol, the client sends the command he wants to execute to the primary server. Therefore he does not need to require any ticket or lock. The primary server does not confirm the receipt or the acceptance of the command. Instead the primary creates a list of transactions internally and stores this command at the end of the list (basically the serializer function). The primary server is one of all the servers and needs to be selected during the process, maybe also multiple times. The selection process, that results in a new primary can be realised in multiple different ways. This process won’t be focused in this paper.


(a) Command Transfer
(b) Command Spread
 
 
Figure 3: Step one - Step two Command Spread

 


This transfer command from the client and the complete history, that was build up internally, will be spread by the primary server to all other replica servers in the second step. With the complete history of commands, the replicas can identify, wether they have missed a command and need to re-execute it, or if they are up to date and can directly execute the new command.

When the replica servers completed their work, they obtain a result, which they transfer to the client afterward. This result contains the command, they just executed and the data state at the end.


Figure 4: Result notification

 


After this third step, the client has an amount of result, within that it can identify, if the command he wanted to execute was execute by the majority of servers. If he notices, that is was not executed, he needs to start from the scratch.

These three steps are the normal behavior of the Zyzzyva protocol. As long as all participants are trustful and the network works fine (=¿ no constant data loss). If the client does not get any valid result from all replicas, this can have different reasons. Either it’s just a message loss or the primary server does not forward the information correct or all replicas are broken. In this case, the client can start a kind of ”error search”. This is done by contacting all replica servers directly and transfer them the command he wanted to execute before, but that was not executed at all. The replicas take this command and try to pass it to the primary as if they were client, so they should get a command (see step 2 of the protocol).

If they don’t receive a valid result at all, they know that something is wrong with the primary (he is identified as byzantine) so they can start another part of the protocol, which at the end leads to a new primary server. The process of the reselection is called ”viewchange” can be started by any replica server. Therefor the replica, that wants to dethrone the primary, needs to send an ”i Hate the primary” command to all other replicas. These other replicas need to approve the reselection, so it can begin. The reselection of a new primary server is not part of this protocol, so lets just say, a new primary gets selected. After the new primary got throned, this server is doing the same job as the other one before and the protocol can continue.

4.2 Benefits

The big benefit of this protocol is the fact, that the amount of messages that need to be send by the client, is pretty small, what in conclusion leads to a very fast protocol. Also the protocol has a good error handling because the whole history is always spread to all replicas, so the order of all commands can be guaranteed. There is no chance to confuse the replicas by sending different commands to them directly from the client. Also this protocol is safe against byzantine participants on server and on client side, because there is a single point of access for the client, and the distributed access in case of a failure will be validated by the replica servers.

4.3 Challenges

The reselection process can be instantiated by any replica server in the system. In that case, the system is not responsive for the time until the new primary server get selected. This could lead to a performance issue in the network and may cause useless re selections. In addition, the primary is the only point of access for the client, so he can execute the commands in an order that he prefers, what is a high rate of power.

5 Comparison

After we looked at the two consensus protocols, it’s easy to see, that they both have completely different approaches, but lead to the same result at the end. The differences they have can be seen in different categories. The categories, that we want to have a closer look at, are the performance and the error handling of the protocols.

5.1 Performance

The performance is directly associated to the amount of messages that need to be transferred in the network. In the Paxos consensus protocol, the amount of messages for a successfully transferred command is very high. Because a command needs to request a ticket per server, transfer the command with the ticket, per server, and execute it per server. To have it in numbers, we define S.

S = Amount of Servers in the Network (1)

So in worst case, if all servers answer and the command can be processed, there are six times as many communications, as servers in the system exists.

O(S) = S * 2 + S * 2 + S * 2 (2)
  = 6S (3)

In comparison the amount of messages in a zyzzyva protocol environment consists of one message to the primary and one message from the primary to all replicas (S-1), and the return messages to the client.

O(S) = 1 + S - 1 + S - 1 (4)
  = 2S - 1 (5)

So with a pure look at the amount of messages in the network, Paxos has around three times as many messages, as zyzzyva, as long as the participants in the network are honest. The amount of communications in Zyzzyva, to identify an byzantine primary and reselect a new one, is much higher. In that case, the client needs to contact all replicas, they need to forward the command to the primary and validate the result. Afterwards the broadcast the ”i Hate the primary” and do the reselection.

O(S) = S - 1 + S - 1 + S - 1 + reselection itself (6)
  = 3S - 3 (7)

So what happens here is a high amount of messages. But, this has only to be done, if the primary is byzantine. As long as the primary is an honest node, the protocol itself is faster than paxos.

5.2 Error Handling

Another interesting point in compare of consensus protocols is the fault tolerance. Or in other words, how much and what kind of negative influence in the network does the protocol tolerate. The message loss symptom can be handled by both protocols. The next higher level of an influence is the crash of a node. None of the protocols works with exclusive locks for any client, so the crash of a client is not interesting. The crash of a server on the other hand is having a hard influence. In the Paxos protocol, servers can fall of and just don’t respond anymore. Paxos handles this challenge by always interacting with the majority of servers. This means, that as long as 51 percent of the servers are available, the protocol works. In Zyzzyvar, the fall out of the primary would lead to a reselection of the primary. If a replica server falls out, the synchronisation works fine as well. As an additional benefit, the replica receives the whole history after he recovers, so there is no additional synchronisation or recover protocol nessesary. Conclusion, both protocols are able to tolerate unreachable servers.

The next step in negative influence, are byzantine participants. In the Paxos protocol this is not caught up. If a byzantine client would participate here, he could easily block the whole system. If a server is byzantine, on the other hand, it’s not that bad, because the servers don’t communicate among each other. The complete different way round happens on the Zyzzyva protocol. In that protocol, all requests from the client are serialized by the primary server, so he cannot influence the data into different states on different replicas. If the primary itself is byzantine, the data can be manipulated by him, for example by changing the content of some messages he receives and forwarding false information. This is mainly caught up by signing messages in the network, so he would be identified as a byzantine very fast.

The next step would be something like spoofing wrong addresses and double sending messages, but these kinds of negative influences, are not caught up by any of the two protocols. So in conclusion, both protocols are having very comparable intruder detection borders.

5.3 Equality

Also if those two protocols have a few differences, they do also have many things in common. One of the mayor prerequisits of both protocols, is that any node needs to be able to communicate with any other node. This leads to the further conclusion, that the network participants need to register somehow and need to be known among each other. If a network becomes very dynamic, and have a high exchange of participants, none of the above protocols could handle this, because both need communication between all participants.

6 Conclusion & Perspective

As a result of this compare, the two protocols differentiate mainly in the solution of identifying 2and correcting the fallout of some nodes. As long as the amount of nodes, that are expected to fall out, is small, the Zyzzyva protocol seems to be the better choice, but in case of frequently appearing crashes among the servers, the Paxos protocol has a constant lower amount of messages, that are send, in between the different nodes.In addition the Zyzzyva protocol depends in its performance directly on the performance of the selection of a new primary. So both protocols are having their pros and cons, but the selection of the protocol itself depends on the environment, where it should be working. In an environment with less crashing nodes, the Zyzzyva protocol should be better and faster, in more unstable networks the Paxos protocol should be used.

Last modified on Saturday, 28 April 2018 13:01