際際滷

際際滷Share a Scribd company logo
Distributed Computing
CENG 532  ANSWERS OF FINAL EXAM
by SEVAL APRAZ
QUESTION 1)
ANSWER 1)
1.a) The problem can be solved with totally ordered multicast using physical clocks. Messages are
delivered to all players and delivery order is guaranteed to be the same at all players. Physical
clocks can be synchronized by a time server with a centralized algorithm like Cristians Algorithm.
In the scenario of Question 1, the players randomly send messages to a other players. The network
has a non-deterministic delay. It is bounded by . In order to ensure that all the messages are
delivered in the same order at all players:
Sender: Timestamp every message with local time.
Receiver:
1. For a message with timestamp t, transfer it to the receive queue at time t + .
2. Deliver the messages in the receive queue in the order of their timestamps.
Algorithm of totally ordered multicast:
1. Update message is timestamped with senders logical time
2. Update message is multicast (including sender itself)
3. When message is received:
3.1. It is put into local queue,
3.2. Ordered according to timestamp,
3.3. Multicast acknowledgement.
Message is delivered to applications only when
 It is at head of queue
 It has been acknowledged by all involved processes
 Pi sends an acknowledgement to Pj if
 Pi has not made an update request
 Pis identifier is greater than Pjs identifier
 Pis update has been processed;
Assumed that physical clocks of all Players (A, B, C) are synchronized. The example with the
scenario of Question 1 is given below:
The timestamp is formatted as Date and Time expressed according to ISO 8601 [1]. Each package is
attached with a timestamp before sending. The packages are ordered by timestamps. If a message is
acknowledged by all others, it is processing. Physical Clock times like below:
A sends M1 to A, B and C: Send (M1, 2016-01-19T16:07:37+00:00)
Queue of A: [M1]
B receives M1 from A: Receive(M1, 2016-01-19T16:08:00+00:00)
Queue of B: [M1]
B sends ACK(M1) to A and C: Send (ACK(M1), 2016-01-19T16:08:37+00:00)
A receives ACK(M1) from B: Receive(ACK(M1), 2016-01-19T16:09:00+00:00)
C receives ACK(M1) from B: Receive(ACK(M1), 2016-01-19T16:09:00+00:00)
B sends M2 to A, B and C: Send (M2, 2016-01-19T16:10:00+00:00)
Queue of B: [M1, M2]
A receives M2 from B: Receive(M2, 2016-01-19T16:11:00+00:00)
Queue of A: [M1, M2]
C receives M2 from B: Receive(M2, 2016-01-19T16:11:30+00:00)
Queue of C: [M2]
C sends ACK(M2) to A and B: Send(ACK(M2), 2016-01-19T16:12:00+00:00)
A receives ACK(M2) from C: Receive(ACK(M2), 2016-01-19T16:12:30+00:00)
B receives ACK(M2) from C: Receive(ACK(M2), 2016-01-19T16:12:40+00:00)
C sends M3 to A, B and C: Send (M3, 2016-01-19T16:13:15+00:00)
Queue of B: [M2, M3]
A receives M3 from C: Receive(M3, 2016-01-19T16:13:30+00:00)
Queue of A: [M1, M2, M3]
B receives M3 from C: Receive(M3, 2016-01-19T16:13:45+00:00)
Queue of B: [M1, M2, M3]
C receives M1 from A: Receive(M1, 2016-01-19T16:13:46+00:00)
Queue of C: [M1, M2, M3]
C sends ACK(M1) to A and B: Send(ACK(M1), 2016-01-19T16:13:50+00:00)
C processes M1, Queue of C: [M2, M3]
A receives ACK(M1) from C: Receive(ACK(M1), 2016-01-19T16:14:00+00:00)
A processes M1, Queue of A: [M2, M3]
B receives ACK(M1) from C: Receive(ACK(M1), 2016-01-19T16:14:15+00:00)
B processes M1, Queue of B: [M2, M3]
A sends ACK(M2) to B and C: Send(ACK(M2), 2016-01-19T16:14:30+00:00)
A processes M2, Queue of A: [M3]
C receives ACK(M2) from A: Receive(ACK(M2), 2016-01-19T16:15:15+00:00)
C processes M2, Queue of C: [M3]
B receives ACK(M2) from A: Receive(ACK(M2), 2016-01-19T16:15:30+00:00)
B processes M2, Queue of B: [M3]
A sends ACK(M3) to B and C: Send(ACK(M3), 2016-01-19T16:16:01+00:00)
B sends ACK(M3) to A and C: Send(ACK(M3), 2016-01-19T16:16:05+00:00)
A receives ACK(M3) from B: Receive(ACK(M3), 2016-01-19T16:16:15+00:00)
A processes M3, Queue of A: []
B receives ACK(M3) from A: Receive(ACK(M3), 2016-01-19T16:16:20+00:00)
B processes M3, Queue of B: []
C receives ACK(M3) from B: Receive(ACK(M3), 2016-01-19T16:16:30+00:00)
C receives ACK(M3) from A: Receive(ACK(M3), 2016-01-19T16:16:45+00:00)
C processes M3, Queue of C: []
The algorithm works like below:
- A will not multicast an acknowledgement for C until M1 has been done.
- B will not multicast an acknowledgement for C until M2 has been done.
Since an operation cant proceed until acknowledgements for all processes have been received, C
will not proceed until M1 and M2 have finished. This algorithm is going to protect users from
anomalies.
1.b) It is possible to design a solution based on Lamport clocks. Lamport's logical clocks[2] can be
used to implement totally-ordered multicast in a completely distributed fashion. We need totally-
ordered multicast, that is a multicast operation by which all messages are delivered in the same
order to each receiver. We need to guarantee that concurrent updates on a player are seen in the
same order everywhere. This requires a totally ordered multicast.
For reference, let's remember the three rules of Lamport's algorithm are [3]:
1. At process i, increment Li before each event
2. To send message m at process i, apply rule 1 and then include the current local time in the
message, i.e., send(m, Li).
3. To receive a message (m; t) at process j, set Lj = max(Lj ; t) and then apply rule 1 before time-
stamping the receive event.
Algorithm of totally ordered multicast:
1. Update message is timestamped with senders logical time
2. Update message is multicast (including sender itself)
3. When message is received:
3.1. It is put into local queue,
3.2. Ordered according to timestamp,
3.3. Multicast acknowledgement.
Message is delivered to applications only when
 It is at head of queue
 It has been acknowledged by all involved processes
 Pi sends an acknowledgement to Pj if
 Pi has not made an update request
 Pis identifier is greater than Pjs identifier
 Pis update has been processed;
In the war games scenario which is illustrated in the question 1 can be solved by a totally-ordered
multicast implementation with Lamport Clocks. Lamport algorithm (extended for total order)
ensures total ordering of events. At the beginning of time, all three players begin with their logical
clock set to zero. The time included with the messages as they are sent at each step is going to be
totally ordered multicast. After each message receives, each client sends an acknowledgement
multicast to others. An example is given in below figure.
Lamport Clock times like below:
A sends M1 to A, B and C: Send (M1, 1)
Queue of A: [M1]
B receives M1 from A: Receive(M1, 2)
Queue of B: [M1]
B sends ACK(M1) to A and C: Send (ACK(M1), 3)
A receives ACK(M1) from B: Receive(ACK(M1), 4)
C receives ACK(M1) from B: Receive(ACK(M1), 4)
B sends M2 to A, B and C: Send (M2, 4)
Queue of B: [M1, M2]
A receives M2 from B: Receive(M2, 5)
Queue of A: [M1, M2]
C receives M2 from B: Receive(M2, 5)
Queue of C: [M2]
C sends ACK(M2) to A and B: Send(ACK(M2), 6)
A receives ACK(M2) from C: Receive(ACK(M2), 7)
B receives ACK(M2) from C: Receive(ACK(M2), 7)
C sends M3 to A, B and C: Send (M3, 7)
Queue of B: [M2, M3]
A receives M3 from C: Receive(M3, 8)
Queue of A: [M1, M2, M3]
B receives M3 from C: Receive(M3, 8)
Queue of B: [M1, M2, M3]
C receives M1 from A: Receive(M1, 8)
Queue of C: [M1, M2, M3]
C sends ACK(M1) to A and B: Send(ACK(M1), 9)
C processes M1, Queue of C: [M2, M3]
A receives ACK(M1) from C: Receive(ACK(M1), 10)
A processes M1, Queue of A: [M2, M3]
B receives ACK(M1) from C: Receive(ACK(M1), 10)
B processes M1, Queue of B: [M2, M3]
A sends ACK(M2) to B and C: Send(ACK(M2), 11)
A processes M2, Queue of A: [M3]
C receives ACK(M2) from A: Receive(ACK(M2), 12)
C processes M2, Queue of C: [M3]
B receives ACK(M2) from A: Receive(ACK(M2), 12)
B processes M2, Queue of B: [M3]
A sends ACK(M3) to B and C: Send(ACK(M3), 12)
B sends ACK(M3) to A and C: Send(ACK(M3), 13)
A receives ACK(M3) from B: Receive(ACK(M3), 14)
A processes M3, Queue of A: []
B receives ACK(M3) from A: Receive(ACK(M3), 14)
B processes M3, Queue of B: []
C receives ACK(M3) from B: Receive(ACK(M3), 14)
C receives ACK(M3) from A: Receive(ACK(M3), 15)
C processes M3, Queue of C: []
The algorithm works like below:
- A will not multicast an acknowledgement for C until M1 has been done.
- B will not multicast an acknowledgement for C until M2 has been done.
Since an operation cant proceed until acknowledgements for all processes have been received, C
will not proceed until M1 and M2 have finished. This algorithm is going to protect users from
anomalies.
1.c) Vector clocks can be totally ordered by ordering events with concurrent vector clocks on the
basis of the machine where they occur. The algorithm is the same as 1.a nd 1.b answers. Now, time
is going to be vector clocks. The queue holds the messages in order by vector time.
A: Send (M1, [ 1, 0, 0 ] )
B: Receive(M1, [ 1, 0, 0 ]) => Updates it's time table to [1, 0, 0]
B: Send (ACK(M1), [ 1, 1, 0 ] )
A: Receive(ACK(M1), [ 1, 1, 0 ] ) => Updates it's time table to [1, 1, 0]
C: Receive(ACK(M1), [ 1, 1, 0 ] ) => Updates it's time table to [1, 1, 0]
B: Send (M2, [ 1, 2, 0 ] )
A: Receive(M2, [ 1, 2, 0 ] ) => Updates it's time table to [1, 2, 0]
C: Receive(M2, [ 1, 2, 0 ] ) => Updates it's time table to [1, 2, 0]
C: Send (ACK(M2), [ 1, 2, 1 ] )
A: Receive(ACK(M2), [ 1, 2, 1 ] ) => Updates it's time table to [1, 2, 1]
B: Receive(ACK(M2), [ 1, 2, 1 ] ) => Updates it's time table to [1, 2, 1]
C: Send (M3, [ 1, 2, 2 ] )
A: Receive(M3, [ 1, 2, 2 ] ) => Updates it's time table to [1, 2, 2]
B: Receive(M3, [ 1, 2, 2 ] ) => Updates it's time table to [1, 2, 2]
C: Receive(M1, [ 1, 0, 0 ]) => Does not update it's time table, it still [1, 2, 2]
C: Send (ACK(M1), [ 1, 2, 3 ] )
A: Receive(ACK(M1), [ 1, 2, 3 ] ) => Updates it's time table to [1, 2, 3]
B: Receive(ACK(M1), [ 1, 2, 3 ] ) => Updates it's time table to [1, 2, 3]
A: Send (ACK(M2), [ 2, 2, 3 ] )
B: Receive(ACK(M2), [ 2, 2, 3 ] ) => Updates it's time table to [2, 2, 3]
C: Receive(ACK(M2), [ 2, 2, 3 ] ) => Updates it's time table to [2, 2, 3]
A: Send (ACK(M3), [ 3, 2, 3 ] )
B: Send (ACK(M3), [ 2, 3, 3 ] )
A: Receive(ACK(M3), [ 2, 3, 3 ] ) => Updates it's time table to [3, 3, 3]
B: Receive(ACK(M3), [ 3, 2, 3 ] ) => Updates it's time table to [3, 3, 3]
C: Receive(ACK(M3), [ 2, 3, 3 ] ) => Updates it's time table to [2, 3, 3]
C: Receive(ACK(M3), [ 3, 2, 3 ] ) => Updates it's time table to [3, 3, 3]
QUESTION 2)
ANSWER 2) I declare that my exam answers submitted here are original, except for source
material explicitly and properly cited. I also acknowledge that I am aware of the University policy
and regulations on honesty in academic work, and of the disciplinary guidelines and procedures
applicable to breaches of such policy and regulations as contained in the University website
http://oidb.metu.edu.tr/en/academic-rules-and-regulations.
The Ricart-Agrawala Algorithm is an algorithm for mutual exclusion on a distributed system. This
algorithm is an extension and optimization of Lamport's Distributed Mutual Exclusion Algorithm,
by removing the need for release messages [4]. A round in a distributed mutual exclusion
implementation refers to the acquisition and subsequent release of a lock. For n nodes, 2(n-1)
messages are sent in a round using Ricart & Agrawala's algorithm.
Ricart & Agrawala's algorithm[5]
Idea:
 To enter the critical section the requesting process has to get the approval of all processes
(including itself)
 Processes only give the approval if they themselves are not in the critical section.
 If a process wants to enter the critical section itself and it has received a request from
another process, it approves the process with the lowest process identifier
A process P1 receiving a request a process P2:
 If P1 does not want to enter the critical section and it is not in the critical section already, it
sends back an accept message to P2
 If P1 is already in the critical section it buffers the incoming request in a FIFO queue
 If IN_CS is not yet set, but P1 wants to join it, does one of the following:
 If P2 has a lower process ID than P1, it sends back an ACCEPT message to P2
 If P1 has lower process ID than P2, it queues P2's request in a local FIFO queue and
enters the critical section itself
Requesting node:
 Sends a request message to all processes pi (including self)
 Waits for the receipt of an accept message from every process pi
 Upon receipt of n accepts, it enters the critical section
 After exiting the critical section it sends an ACCEPT message to the first process in its local
FIFO queue
In literature there exists Roucairol-Carvalho optimization. In Roucairol-Carvalho algorithm, once
P1 has received a reply message from site P2, site P1 may enter the critical section multiple times
without receiving permission from P2 on subsequent attempts up to the moment when P1 has sent a
reply message to P2.
I think, in order to reduce the number of messages required for each critical section on average, we
can remove unnecessary ACCEPT messages. This can be achieved by using just only one ACCEPT
message from the current user of resource. If assumed that all messages are guaranteed to be
delivered within delta time after they are sent, the nodes can wait for one ACCEPT message instead
of all ACCEPT messages. If a node receives a request and this node doesn't use the resourse, it does
not send any ACCEPT message. It ignores the request. If a node which uses resources currently
receives a request, it waits until it finished the job with resource, then sends an ACCEPT message to
the requestor. With this way, the nodes waits only one ACCEPT message. So this algorithm requires
only (n + 1) messages for n nodes in a round. Let's name this algorithm as "Chapraz algorithm"
which is my surname.
This algorithms seems like later proposed algorithm of authors of Ricart & Agrawala algorithm.
Their later algorithm is based on a token. They proposed using a token to give permission to each
requestor. There are also circular solutions which are using ring token. Maybe they are influenced
by ring token solution.
There is going to be some problems on Chapraz algorithm. When more than one node want to use
same resourse, all of them are going to wait for ACCEPT message. Each node should have a list to
hold requests. The requests are hold with timestamps so that the order of requests are determined by
time. The node with the smallest timestamp should get ACCEPT response before the others. How
do the other nodes realize that a node started to using resource? So there should be only one list of
waiting nodes. The node which is using resourse should have a local list of requestors. After the
release of resource, the node should send the ACCEPT message and the list of waiting nodes to the
first node in the list. The list works as FIFO style. So the list is going to be copied to the other
nodes. This solves the problem of multicasting all nodes to inform about releasing the node. This
method is fully distributed just as before. This Chapraz algorithm may have problems, so it should
be investigated more.
I assumed that the queue is sent in one message. Chapraz algorithm requires (n + 2) which includes
n request messages and 1 Accept message and 1 message includes queue. The Ricart & Agrawala
algorithm requires 2(n-1) messages. If there are more than 4 nodes in a distributed system, the
Chapraz algorithm works with better performance than the Ricart & Agrawala algorithm.
If n=2, then Chapraz algorithm requires 4 messages in average, Ricart & Agrawala algorithm
requires 2 messages.
If n=3, then Chapraz algorithm requires 5 messages in average, Ricart & Agrawala algorithm
requires 4 messages.
If n=4, then Chapraz algorithm requires 6 messages in average, Ricart & Agrawala algorithm
requires 6 messages.
If n=5, then Chapraz algorithm requires 7 messages in average, Ricart & Agrawala algorithm
requires 8 messages.
If n=6, then Chapraz algorithm requires 8 messages in average, Ricart & Agrawala algorithm
requires 10 messages.
QUESTION 3)
ANSWER 3)
3.a) In 2PC, when the coordinator has crashed, participants may not be able to reach a final decision
[6]. Therefore 3PC is proposed by Skeen in 1981. 3PC avoids blocking proceses in the presence of
fail-stop crashes. For example, if a prticipant is in ready state and waits for a response from
coordinator, after a timeout, it can abort the transaction. All other participants are going to abort too
because the coordinator has crashed at wait state. If a participant is in precommit state, after a
timeout, it can commit the transaction safely because the final decision is made already as
committing the transaction. If a participant is in init state, after a timeout, it will eventually make a
transition to state abort. The all other participants should be init state too, therefore they are going to
make a transition abort state too. With 3PC, surviving processess can always come to a final
decision. They can also elect a new coordinator to conclude the transaction. So it provides more
fault-tolerance than 2PC.
3.b.i) It is possible to adapt and use the Paxos algorithm instead of 2PC/3PC. If the coordinator
fails, 2PC blocks. 3PC solves the blocking problem when faliure of coordinator. Paxos does not
block as long as a majority of processes (coordinators) are correct. Fault-tolerant consensus
algorithms also reach agreement, but do not block whenever any majority of the processes are
working. The Paxos Commit algorithm runs a Paxos consensus algorithm on the commit/abort
decision of each participant to obtain a transaction commit protocol that uses 2F + 1 coordinators
and makes progress if at least F +1 of them are working properly [7]. Paxos Commit has the same
stable-storage write delay, and can be implemented to have the same message delay in the fault-free
case, as 2PC, but it uses more messages. In 2PC, for n nodes, (3n) messages are exchanged. In
comparison to 2PC, it requires more messages, but it is resilient to coordinator failures. In
comparison to most 3PC algorithms, Paxos renders a simpler, more efficient algorithm (minimal
message delay), and has been proved to be correct. 3PC is fail-stop resilient, but not fail-recover
resilient. The key difference of Paxos from 2PC is that unlike 2PC where all nodes need to agree,
here only a majority needs to agree. If there exists 2F+1 acceptors, F+1 acceptors are enough for
agreeing to commit or abort. In Paxos algorithm, any node can be a leader, so there can be more
than one leader in a system.
3.b.ii) Paxos can solve the drawbacks of 2PC. Paxos is more failure tolerant than 2PC:
 Leader fails  another Leader can take over the protocol by issuing its own proposal.
 Original Leader recovers  two Leaders can co-exist thanks to the rules on agreeing only to
higher numbered proposals and committing only previously accepted values.
"Our Paxos Commit algorithm goes further in essentially eliminating the TMs role in making the
decision. In Two-Phase Commit, the TM can unilaterally decide to abort. In Paxos Commit, a leader
can make an abort decision only for an RM that does not decide for itself. The leader does this by
initiating a ballot with number greater than 0 for that RMs instance of Paxos. (The leader must be
able to do this to prevent blocking by a failed RM.) "[7]
QUESTION 4)
ANSWER 4)
Answer the following questions regarding the consistency models:
4.a) The processes P1-P4 are explained below.
4.a.i)
Sequentially consistent: Yes
Process P1 first performs W(x)a to x. Later (in absolute time), process P2 performs a write
operation, by setting the value of x to b. Both P3 and P4 first read value b. P4 reads later value a.
Write operation of process P2 appears to have taken place before that of P1. Therefore it does not
violates sequential consistency.
Casually consistent: Yes
Process P1 writes data item x and P2 also writes data item x. After then, P3 and P4 reads data item
x. Concurrent writes may be seen in a different order on different machines. Therefore it does not
violate the casually consistency.
4.a.ii)
Sequentially consistent: No
Not all processes see the same interleaving of write operations. To process P3, it appears as if the
data item has first been changed to c, and later to a. However, there is no process that operates first
W(x)c, then W(x)a. Also P4 will conclude that the final value is b. Therefore it violates sequential
consistency.
Casually consistent: No
Having read c (R(x)c), P3 must continue to read c or some newer value (perhaps b), but can not go
back to a, because W(x)c was conditional upon W(x)a having finished. Therefore it violates the
casually consistency.
4.a.iii)
Sequentially consistent: No
Not all processes see the same interleaving of write operations. To process P3, it appears as if the
data item has first been changed to b, and later to a. However, P3 will conclude that the final value
is b. Therefore it violates sequential consistency.
Casually consistent: Yes
Process P1 writes data item x. The data item has first been changed to a. P2 also writes data item x.
After then, P3 and P4 reads data item x. Concurrent writes may be seen in a different order on
different machines. Therefore it does not violate the casually consistency.
4.a.iv)
Sequentially consistent: No
Not all processes see the same interleaving of write operations. To process P3, it appears as if the
data item has first been changed to b, and later to c. However, P4 will conclude that the final value
is b. No global sequential global ordering can explain these results. Therefore it violates sequential
consistency.
Casually consistent: Yes
W(x)b is causally-related on R(x)a, which is causally-related on W(x)a. Therefore, system must
enforce W(x)a < W(x)b ordering. But P3 and P4 does not violate that ordering, because they reads a
before reading b and c. Therefore it does not violate the casually consistency.
4.b) In sequential consistency, all processes see all shared accesses in the same order. Accesses are
not ordered in time. In casual consistency, all processes see causally-related shared accesses in the
same order. The causal consistency is a weaker consistency model than sequential consistency by
making the distinction between causally related operations and those that are not related. For
example, if an event b takes effect from an earlier event a, the causal consistency guarantees that all
processes see event b after event a. If a system provides sequential consistency, it provides causal
consistency too. If a system is not causal, hence it is not sequential. Therefore we cannot give an
example to a system which is not causal but sequential. When there is a deadlock, a not causally but
sequentially consistent execution may be possible.
References
[1] ISO 8601, Wikipedia, https://en.wikipedia.org/wiki/ISO_8601
[2] Lamport L. Time, Clocks and the Ordering of Events in Distributed Systems,
Communications of the ACM, Vol. 21, No. 7, July 1978, pp. 558-565.
[3] Andersen D. 15-440 Distributed Systems Lecture Notes, School of Computer Science,
Carnegie Mellon University, https://www.cs.cmu.edu/~dga/15-440/S14/lectures/09-time.pdf
[4] RicartAgrawala algorithm, Wikipedia, https://en.wikipedia.org/wiki/Ricart
%E2%80%93Agrawala_algorithm
[5] Ghodsi A. 2g1509 Distributed Systems Lecture Notes, SICS Swedish ICT,
https://www.sics.se/~ali/teaching/ds/ds-ricart.pdf (2004)
[6] Tanenbaum A. S., Steen M. V. Distributed Systems Principles and Paradigms Second Edition,
p361, 2006.
[7] Gray J., Lamport L. Consensus on Transaction Commit ACM Transactions on Database
Systems (TODS) 31.1 (2006): 133-160. http://research.microsoft.com/pubs/64636/tr-2003-96.pdf

More Related Content

Distributed Computing Answers

  • 1. Distributed Computing CENG 532 ANSWERS OF FINAL EXAM by SEVAL APRAZ QUESTION 1) ANSWER 1) 1.a) The problem can be solved with totally ordered multicast using physical clocks. Messages are delivered to all players and delivery order is guaranteed to be the same at all players. Physical clocks can be synchronized by a time server with a centralized algorithm like Cristians Algorithm. In the scenario of Question 1, the players randomly send messages to a other players. The network has a non-deterministic delay. It is bounded by . In order to ensure that all the messages are delivered in the same order at all players: Sender: Timestamp every message with local time. Receiver: 1. For a message with timestamp t, transfer it to the receive queue at time t + . 2. Deliver the messages in the receive queue in the order of their timestamps.
  • 2. Algorithm of totally ordered multicast: 1. Update message is timestamped with senders logical time 2. Update message is multicast (including sender itself) 3. When message is received: 3.1. It is put into local queue, 3.2. Ordered according to timestamp, 3.3. Multicast acknowledgement. Message is delivered to applications only when It is at head of queue It has been acknowledged by all involved processes Pi sends an acknowledgement to Pj if Pi has not made an update request Pis identifier is greater than Pjs identifier Pis update has been processed; Assumed that physical clocks of all Players (A, B, C) are synchronized. The example with the scenario of Question 1 is given below: The timestamp is formatted as Date and Time expressed according to ISO 8601 [1]. Each package is attached with a timestamp before sending. The packages are ordered by timestamps. If a message is acknowledged by all others, it is processing. Physical Clock times like below: A sends M1 to A, B and C: Send (M1, 2016-01-19T16:07:37+00:00) Queue of A: [M1] B receives M1 from A: Receive(M1, 2016-01-19T16:08:00+00:00) Queue of B: [M1] B sends ACK(M1) to A and C: Send (ACK(M1), 2016-01-19T16:08:37+00:00) A receives ACK(M1) from B: Receive(ACK(M1), 2016-01-19T16:09:00+00:00) C receives ACK(M1) from B: Receive(ACK(M1), 2016-01-19T16:09:00+00:00) B sends M2 to A, B and C: Send (M2, 2016-01-19T16:10:00+00:00) Queue of B: [M1, M2] A receives M2 from B: Receive(M2, 2016-01-19T16:11:00+00:00) Queue of A: [M1, M2] C receives M2 from B: Receive(M2, 2016-01-19T16:11:30+00:00) Queue of C: [M2]
  • 3. C sends ACK(M2) to A and B: Send(ACK(M2), 2016-01-19T16:12:00+00:00) A receives ACK(M2) from C: Receive(ACK(M2), 2016-01-19T16:12:30+00:00) B receives ACK(M2) from C: Receive(ACK(M2), 2016-01-19T16:12:40+00:00) C sends M3 to A, B and C: Send (M3, 2016-01-19T16:13:15+00:00) Queue of B: [M2, M3] A receives M3 from C: Receive(M3, 2016-01-19T16:13:30+00:00) Queue of A: [M1, M2, M3] B receives M3 from C: Receive(M3, 2016-01-19T16:13:45+00:00) Queue of B: [M1, M2, M3] C receives M1 from A: Receive(M1, 2016-01-19T16:13:46+00:00) Queue of C: [M1, M2, M3] C sends ACK(M1) to A and B: Send(ACK(M1), 2016-01-19T16:13:50+00:00) C processes M1, Queue of C: [M2, M3] A receives ACK(M1) from C: Receive(ACK(M1), 2016-01-19T16:14:00+00:00) A processes M1, Queue of A: [M2, M3] B receives ACK(M1) from C: Receive(ACK(M1), 2016-01-19T16:14:15+00:00) B processes M1, Queue of B: [M2, M3] A sends ACK(M2) to B and C: Send(ACK(M2), 2016-01-19T16:14:30+00:00) A processes M2, Queue of A: [M3] C receives ACK(M2) from A: Receive(ACK(M2), 2016-01-19T16:15:15+00:00) C processes M2, Queue of C: [M3] B receives ACK(M2) from A: Receive(ACK(M2), 2016-01-19T16:15:30+00:00) B processes M2, Queue of B: [M3] A sends ACK(M3) to B and C: Send(ACK(M3), 2016-01-19T16:16:01+00:00) B sends ACK(M3) to A and C: Send(ACK(M3), 2016-01-19T16:16:05+00:00) A receives ACK(M3) from B: Receive(ACK(M3), 2016-01-19T16:16:15+00:00) A processes M3, Queue of A: [] B receives ACK(M3) from A: Receive(ACK(M3), 2016-01-19T16:16:20+00:00) B processes M3, Queue of B: [] C receives ACK(M3) from B: Receive(ACK(M3), 2016-01-19T16:16:30+00:00) C receives ACK(M3) from A: Receive(ACK(M3), 2016-01-19T16:16:45+00:00) C processes M3, Queue of C: [] The algorithm works like below: - A will not multicast an acknowledgement for C until M1 has been done. - B will not multicast an acknowledgement for C until M2 has been done. Since an operation cant proceed until acknowledgements for all processes have been received, C will not proceed until M1 and M2 have finished. This algorithm is going to protect users from anomalies. 1.b) It is possible to design a solution based on Lamport clocks. Lamport's logical clocks[2] can be used to implement totally-ordered multicast in a completely distributed fashion. We need totally- ordered multicast, that is a multicast operation by which all messages are delivered in the same order to each receiver. We need to guarantee that concurrent updates on a player are seen in the same order everywhere. This requires a totally ordered multicast. For reference, let's remember the three rules of Lamport's algorithm are [3]: 1. At process i, increment Li before each event 2. To send message m at process i, apply rule 1 and then include the current local time in the message, i.e., send(m, Li).
  • 4. 3. To receive a message (m; t) at process j, set Lj = max(Lj ; t) and then apply rule 1 before time- stamping the receive event. Algorithm of totally ordered multicast: 1. Update message is timestamped with senders logical time 2. Update message is multicast (including sender itself) 3. When message is received: 3.1. It is put into local queue, 3.2. Ordered according to timestamp, 3.3. Multicast acknowledgement. Message is delivered to applications only when It is at head of queue It has been acknowledged by all involved processes Pi sends an acknowledgement to Pj if Pi has not made an update request Pis identifier is greater than Pjs identifier Pis update has been processed; In the war games scenario which is illustrated in the question 1 can be solved by a totally-ordered multicast implementation with Lamport Clocks. Lamport algorithm (extended for total order) ensures total ordering of events. At the beginning of time, all three players begin with their logical clock set to zero. The time included with the messages as they are sent at each step is going to be totally ordered multicast. After each message receives, each client sends an acknowledgement multicast to others. An example is given in below figure. Lamport Clock times like below: A sends M1 to A, B and C: Send (M1, 1) Queue of A: [M1] B receives M1 from A: Receive(M1, 2) Queue of B: [M1] B sends ACK(M1) to A and C: Send (ACK(M1), 3) A receives ACK(M1) from B: Receive(ACK(M1), 4) C receives ACK(M1) from B: Receive(ACK(M1), 4) B sends M2 to A, B and C: Send (M2, 4) Queue of B: [M1, M2] A receives M2 from B: Receive(M2, 5)
  • 5. Queue of A: [M1, M2] C receives M2 from B: Receive(M2, 5) Queue of C: [M2] C sends ACK(M2) to A and B: Send(ACK(M2), 6) A receives ACK(M2) from C: Receive(ACK(M2), 7) B receives ACK(M2) from C: Receive(ACK(M2), 7) C sends M3 to A, B and C: Send (M3, 7) Queue of B: [M2, M3] A receives M3 from C: Receive(M3, 8) Queue of A: [M1, M2, M3] B receives M3 from C: Receive(M3, 8) Queue of B: [M1, M2, M3] C receives M1 from A: Receive(M1, 8) Queue of C: [M1, M2, M3] C sends ACK(M1) to A and B: Send(ACK(M1), 9) C processes M1, Queue of C: [M2, M3] A receives ACK(M1) from C: Receive(ACK(M1), 10) A processes M1, Queue of A: [M2, M3] B receives ACK(M1) from C: Receive(ACK(M1), 10) B processes M1, Queue of B: [M2, M3] A sends ACK(M2) to B and C: Send(ACK(M2), 11) A processes M2, Queue of A: [M3] C receives ACK(M2) from A: Receive(ACK(M2), 12) C processes M2, Queue of C: [M3] B receives ACK(M2) from A: Receive(ACK(M2), 12) B processes M2, Queue of B: [M3] A sends ACK(M3) to B and C: Send(ACK(M3), 12) B sends ACK(M3) to A and C: Send(ACK(M3), 13) A receives ACK(M3) from B: Receive(ACK(M3), 14) A processes M3, Queue of A: [] B receives ACK(M3) from A: Receive(ACK(M3), 14) B processes M3, Queue of B: [] C receives ACK(M3) from B: Receive(ACK(M3), 14) C receives ACK(M3) from A: Receive(ACK(M3), 15) C processes M3, Queue of C: [] The algorithm works like below: - A will not multicast an acknowledgement for C until M1 has been done. - B will not multicast an acknowledgement for C until M2 has been done. Since an operation cant proceed until acknowledgements for all processes have been received, C will not proceed until M1 and M2 have finished. This algorithm is going to protect users from anomalies. 1.c) Vector clocks can be totally ordered by ordering events with concurrent vector clocks on the basis of the machine where they occur. The algorithm is the same as 1.a nd 1.b answers. Now, time is going to be vector clocks. The queue holds the messages in order by vector time.
  • 6. A: Send (M1, [ 1, 0, 0 ] ) B: Receive(M1, [ 1, 0, 0 ]) => Updates it's time table to [1, 0, 0] B: Send (ACK(M1), [ 1, 1, 0 ] ) A: Receive(ACK(M1), [ 1, 1, 0 ] ) => Updates it's time table to [1, 1, 0] C: Receive(ACK(M1), [ 1, 1, 0 ] ) => Updates it's time table to [1, 1, 0] B: Send (M2, [ 1, 2, 0 ] ) A: Receive(M2, [ 1, 2, 0 ] ) => Updates it's time table to [1, 2, 0] C: Receive(M2, [ 1, 2, 0 ] ) => Updates it's time table to [1, 2, 0] C: Send (ACK(M2), [ 1, 2, 1 ] ) A: Receive(ACK(M2), [ 1, 2, 1 ] ) => Updates it's time table to [1, 2, 1] B: Receive(ACK(M2), [ 1, 2, 1 ] ) => Updates it's time table to [1, 2, 1] C: Send (M3, [ 1, 2, 2 ] ) A: Receive(M3, [ 1, 2, 2 ] ) => Updates it's time table to [1, 2, 2] B: Receive(M3, [ 1, 2, 2 ] ) => Updates it's time table to [1, 2, 2] C: Receive(M1, [ 1, 0, 0 ]) => Does not update it's time table, it still [1, 2, 2] C: Send (ACK(M1), [ 1, 2, 3 ] ) A: Receive(ACK(M1), [ 1, 2, 3 ] ) => Updates it's time table to [1, 2, 3] B: Receive(ACK(M1), [ 1, 2, 3 ] ) => Updates it's time table to [1, 2, 3] A: Send (ACK(M2), [ 2, 2, 3 ] ) B: Receive(ACK(M2), [ 2, 2, 3 ] ) => Updates it's time table to [2, 2, 3] C: Receive(ACK(M2), [ 2, 2, 3 ] ) => Updates it's time table to [2, 2, 3] A: Send (ACK(M3), [ 3, 2, 3 ] ) B: Send (ACK(M3), [ 2, 3, 3 ] ) A: Receive(ACK(M3), [ 2, 3, 3 ] ) => Updates it's time table to [3, 3, 3] B: Receive(ACK(M3), [ 3, 2, 3 ] ) => Updates it's time table to [3, 3, 3] C: Receive(ACK(M3), [ 2, 3, 3 ] ) => Updates it's time table to [2, 3, 3] C: Receive(ACK(M3), [ 3, 2, 3 ] ) => Updates it's time table to [3, 3, 3]
  • 7. QUESTION 2) ANSWER 2) I declare that my exam answers submitted here are original, except for source material explicitly and properly cited. I also acknowledge that I am aware of the University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations as contained in the University website http://oidb.metu.edu.tr/en/academic-rules-and-regulations. The Ricart-Agrawala Algorithm is an algorithm for mutual exclusion on a distributed system. This algorithm is an extension and optimization of Lamport's Distributed Mutual Exclusion Algorithm, by removing the need for release messages [4]. A round in a distributed mutual exclusion implementation refers to the acquisition and subsequent release of a lock. For n nodes, 2(n-1) messages are sent in a round using Ricart & Agrawala's algorithm. Ricart & Agrawala's algorithm[5] Idea: To enter the critical section the requesting process has to get the approval of all processes (including itself) Processes only give the approval if they themselves are not in the critical section. If a process wants to enter the critical section itself and it has received a request from another process, it approves the process with the lowest process identifier A process P1 receiving a request a process P2: If P1 does not want to enter the critical section and it is not in the critical section already, it sends back an accept message to P2 If P1 is already in the critical section it buffers the incoming request in a FIFO queue If IN_CS is not yet set, but P1 wants to join it, does one of the following: If P2 has a lower process ID than P1, it sends back an ACCEPT message to P2 If P1 has lower process ID than P2, it queues P2's request in a local FIFO queue and enters the critical section itself Requesting node: Sends a request message to all processes pi (including self) Waits for the receipt of an accept message from every process pi Upon receipt of n accepts, it enters the critical section After exiting the critical section it sends an ACCEPT message to the first process in its local FIFO queue In literature there exists Roucairol-Carvalho optimization. In Roucairol-Carvalho algorithm, once P1 has received a reply message from site P2, site P1 may enter the critical section multiple times without receiving permission from P2 on subsequent attempts up to the moment when P1 has sent a
  • 8. reply message to P2. I think, in order to reduce the number of messages required for each critical section on average, we can remove unnecessary ACCEPT messages. This can be achieved by using just only one ACCEPT message from the current user of resource. If assumed that all messages are guaranteed to be delivered within delta time after they are sent, the nodes can wait for one ACCEPT message instead of all ACCEPT messages. If a node receives a request and this node doesn't use the resourse, it does not send any ACCEPT message. It ignores the request. If a node which uses resources currently receives a request, it waits until it finished the job with resource, then sends an ACCEPT message to the requestor. With this way, the nodes waits only one ACCEPT message. So this algorithm requires only (n + 1) messages for n nodes in a round. Let's name this algorithm as "Chapraz algorithm" which is my surname. This algorithms seems like later proposed algorithm of authors of Ricart & Agrawala algorithm. Their later algorithm is based on a token. They proposed using a token to give permission to each requestor. There are also circular solutions which are using ring token. Maybe they are influenced by ring token solution. There is going to be some problems on Chapraz algorithm. When more than one node want to use same resourse, all of them are going to wait for ACCEPT message. Each node should have a list to hold requests. The requests are hold with timestamps so that the order of requests are determined by time. The node with the smallest timestamp should get ACCEPT response before the others. How do the other nodes realize that a node started to using resource? So there should be only one list of waiting nodes. The node which is using resourse should have a local list of requestors. After the release of resource, the node should send the ACCEPT message and the list of waiting nodes to the first node in the list. The list works as FIFO style. So the list is going to be copied to the other nodes. This solves the problem of multicasting all nodes to inform about releasing the node. This method is fully distributed just as before. This Chapraz algorithm may have problems, so it should be investigated more. I assumed that the queue is sent in one message. Chapraz algorithm requires (n + 2) which includes n request messages and 1 Accept message and 1 message includes queue. The Ricart & Agrawala algorithm requires 2(n-1) messages. If there are more than 4 nodes in a distributed system, the Chapraz algorithm works with better performance than the Ricart & Agrawala algorithm. If n=2, then Chapraz algorithm requires 4 messages in average, Ricart & Agrawala algorithm requires 2 messages. If n=3, then Chapraz algorithm requires 5 messages in average, Ricart & Agrawala algorithm requires 4 messages. If n=4, then Chapraz algorithm requires 6 messages in average, Ricart & Agrawala algorithm requires 6 messages. If n=5, then Chapraz algorithm requires 7 messages in average, Ricart & Agrawala algorithm requires 8 messages. If n=6, then Chapraz algorithm requires 8 messages in average, Ricart & Agrawala algorithm requires 10 messages.
  • 9. QUESTION 3) ANSWER 3) 3.a) In 2PC, when the coordinator has crashed, participants may not be able to reach a final decision [6]. Therefore 3PC is proposed by Skeen in 1981. 3PC avoids blocking proceses in the presence of fail-stop crashes. For example, if a prticipant is in ready state and waits for a response from coordinator, after a timeout, it can abort the transaction. All other participants are going to abort too because the coordinator has crashed at wait state. If a participant is in precommit state, after a timeout, it can commit the transaction safely because the final decision is made already as committing the transaction. If a participant is in init state, after a timeout, it will eventually make a transition to state abort. The all other participants should be init state too, therefore they are going to make a transition abort state too. With 3PC, surviving processess can always come to a final decision. They can also elect a new coordinator to conclude the transaction. So it provides more fault-tolerance than 2PC. 3.b.i) It is possible to adapt and use the Paxos algorithm instead of 2PC/3PC. If the coordinator fails, 2PC blocks. 3PC solves the blocking problem when faliure of coordinator. Paxos does not block as long as a majority of processes (coordinators) are correct. Fault-tolerant consensus algorithms also reach agreement, but do not block whenever any majority of the processes are working. The Paxos Commit algorithm runs a Paxos consensus algorithm on the commit/abort decision of each participant to obtain a transaction commit protocol that uses 2F + 1 coordinators and makes progress if at least F +1 of them are working properly [7]. Paxos Commit has the same stable-storage write delay, and can be implemented to have the same message delay in the fault-free case, as 2PC, but it uses more messages. In 2PC, for n nodes, (3n) messages are exchanged. In comparison to 2PC, it requires more messages, but it is resilient to coordinator failures. In comparison to most 3PC algorithms, Paxos renders a simpler, more efficient algorithm (minimal message delay), and has been proved to be correct. 3PC is fail-stop resilient, but not fail-recover resilient. The key difference of Paxos from 2PC is that unlike 2PC where all nodes need to agree, here only a majority needs to agree. If there exists 2F+1 acceptors, F+1 acceptors are enough for agreeing to commit or abort. In Paxos algorithm, any node can be a leader, so there can be more than one leader in a system. 3.b.ii) Paxos can solve the drawbacks of 2PC. Paxos is more failure tolerant than 2PC: Leader fails another Leader can take over the protocol by issuing its own proposal. Original Leader recovers two Leaders can co-exist thanks to the rules on agreeing only to higher numbered proposals and committing only previously accepted values. "Our Paxos Commit algorithm goes further in essentially eliminating the TMs role in making the decision. In Two-Phase Commit, the TM can unilaterally decide to abort. In Paxos Commit, a leader can make an abort decision only for an RM that does not decide for itself. The leader does this by initiating a ballot with number greater than 0 for that RMs instance of Paxos. (The leader must be able to do this to prevent blocking by a failed RM.) "[7]
  • 10. QUESTION 4) ANSWER 4) Answer the following questions regarding the consistency models: 4.a) The processes P1-P4 are explained below. 4.a.i) Sequentially consistent: Yes Process P1 first performs W(x)a to x. Later (in absolute time), process P2 performs a write operation, by setting the value of x to b. Both P3 and P4 first read value b. P4 reads later value a. Write operation of process P2 appears to have taken place before that of P1. Therefore it does not violates sequential consistency. Casually consistent: Yes Process P1 writes data item x and P2 also writes data item x. After then, P3 and P4 reads data item x. Concurrent writes may be seen in a different order on different machines. Therefore it does not violate the casually consistency.
  • 11. 4.a.ii) Sequentially consistent: No Not all processes see the same interleaving of write operations. To process P3, it appears as if the data item has first been changed to c, and later to a. However, there is no process that operates first W(x)c, then W(x)a. Also P4 will conclude that the final value is b. Therefore it violates sequential consistency. Casually consistent: No Having read c (R(x)c), P3 must continue to read c or some newer value (perhaps b), but can not go back to a, because W(x)c was conditional upon W(x)a having finished. Therefore it violates the casually consistency. 4.a.iii) Sequentially consistent: No Not all processes see the same interleaving of write operations. To process P3, it appears as if the data item has first been changed to b, and later to a. However, P3 will conclude that the final value is b. Therefore it violates sequential consistency. Casually consistent: Yes Process P1 writes data item x. The data item has first been changed to a. P2 also writes data item x. After then, P3 and P4 reads data item x. Concurrent writes may be seen in a different order on different machines. Therefore it does not violate the casually consistency. 4.a.iv) Sequentially consistent: No
  • 12. Not all processes see the same interleaving of write operations. To process P3, it appears as if the data item has first been changed to b, and later to c. However, P4 will conclude that the final value is b. No global sequential global ordering can explain these results. Therefore it violates sequential consistency. Casually consistent: Yes W(x)b is causally-related on R(x)a, which is causally-related on W(x)a. Therefore, system must enforce W(x)a < W(x)b ordering. But P3 and P4 does not violate that ordering, because they reads a before reading b and c. Therefore it does not violate the casually consistency. 4.b) In sequential consistency, all processes see all shared accesses in the same order. Accesses are not ordered in time. In casual consistency, all processes see causally-related shared accesses in the same order. The causal consistency is a weaker consistency model than sequential consistency by making the distinction between causally related operations and those that are not related. For example, if an event b takes effect from an earlier event a, the causal consistency guarantees that all processes see event b after event a. If a system provides sequential consistency, it provides causal consistency too. If a system is not causal, hence it is not sequential. Therefore we cannot give an example to a system which is not causal but sequential. When there is a deadlock, a not causally but sequentially consistent execution may be possible. References [1] ISO 8601, Wikipedia, https://en.wikipedia.org/wiki/ISO_8601 [2] Lamport L. Time, Clocks and the Ordering of Events in Distributed Systems, Communications of the ACM, Vol. 21, No. 7, July 1978, pp. 558-565. [3] Andersen D. 15-440 Distributed Systems Lecture Notes, School of Computer Science, Carnegie Mellon University, https://www.cs.cmu.edu/~dga/15-440/S14/lectures/09-time.pdf [4] RicartAgrawala algorithm, Wikipedia, https://en.wikipedia.org/wiki/Ricart %E2%80%93Agrawala_algorithm [5] Ghodsi A. 2g1509 Distributed Systems Lecture Notes, SICS Swedish ICT, https://www.sics.se/~ali/teaching/ds/ds-ricart.pdf (2004) [6] Tanenbaum A. S., Steen M. V. Distributed Systems Principles and Paradigms Second Edition, p361, 2006. [7] Gray J., Lamport L. Consensus on Transaction Commit ACM Transactions on Database Systems (TODS) 31.1 (2006): 133-160. http://research.microsoft.com/pubs/64636/tr-2003-96.pdf