際際滷

際際滷Share a Scribd company logo
Dynamo concepts in depth.
 
 
 
 
 
 
 
 Pavlo Baron, codecentric AG
Friday, August 31, 12
Pavlo Baron
                        pavlo.baron@codecentric.de
                                     @pavlobaron
Friday, August 31, 12
The shopping cart case




Friday, August 31, 12
The 2 AM alarm call case




Friday, August 31, 12
The Tower of Babel case




Friday, August 31, 12
The Neo vs. Smiths case




Friday, August 31, 12
The Pavlo case




Friday, August 31, 12
Friday, August 31, 12
So Dynamo isnt about speed.

                                Its about immediate,
                                       reliable writes.

                                           Its about
                                operation relaxation.

                               Its about distribution
                                  and fault tolerance.

                                    Its about almost
                                    linear scalability.
Friday, August 31, 12
Time and timestamps




Friday, August 31, 12
Clocks


V(i), V(j): competing


Con鍖ict resolution:


 
 
 1: siblings, client

 
 
 2: merge, system

 
 
 3: voting, system



Friday, August 31, 12
Vector clocks
Node 1




                1,0,0    2,2,0           3,2,0     4,3,3
Node 2




                 1,1,0   1,2,0   1,3,3             4,4,3
Node 3




              1,0,1      1,2,2   1,2,3             4,3,4


Friday, August 31, 12
Vector clocks
Node 1 Node 2 Node 3 Node 4


                              1,0,0,0



                               1,1,0,0     1,2,0,0   1,3,0,3



                              1,0,1,0                           1,0,2,0



                              1,0,0,1    1,2,0,2     1,2,0,3

   Friday, August 31, 12
O(1) for data lookups / delta tracking




                                     #
Friday, August 31, 12
Merkle Trees


N, M: nodes
HT(N), HT(M): hash trees


M needs update:

 
 
 obtain HT(N)

 
 
 calc delta(HT(M), HT(N))

 
 
 pull keys(delta)



Friday, August 31, 12
Node a.1                                    Merkle Trees
                              a
                        ab        ac
     abc                abd       acb   acc


     abe                abd       ada   adb
                        ab        ad
                              a
  Node a.2
Friday, August 31, 12
Node a.1                                    Merkle Trees
                              a
                        ab
     abc                abd


                        abd       ada   adb
                        ab        ad
                              a
  Node a.2
Friday, August 31, 12
Equal nodes based decentralized distribution




Friday, August 31, 12
Consensus, agreement, voting, quorum




Friday, August 31, 12
Consistent hashing - the ring


       X bit integer space
       
 
 
 0 <= N <= 2 ^ X


       or: 2 x Pi
       
 
 
 0 <= A <= 2 x Pi
       
 
 
 x(N) = cos(A)
       
 
 
 y(N) = sin(A)



Friday, August 31, 12
Quorum


 V: vnodes holding a key
 W: write quorum
 R: read quorum
 DW: durable write quorum

 
 
 
     W > 0.5 * V
 
 
 
 R+W>V



Friday, August 31, 12
Insert key
   Key = foo
                                    (sloppy quorum)
  # = N, W = 2



                            replicate

                        N
                              ok




Friday, August 31, 12
Add node




                             co
                                py
                                     leave
                        leave
                                        co
                                             py
                         py




                                     leave
                        co




Friday, August 31, 12
Lookup key
                                               (sloppy
                                             quorum)
  N
                           Value = bar




                        Key = foo
                        # = N, R = 2



Friday, August 31, 12
Remove
                                         node


                        copy



                               leave




Friday, August 31, 12
Gossip  node down/up
Node 1
Node 2




                          update,                 read,
                update                 update
                          4 down                  4 up
Node 3 Node 4




                         update            read


 Friday, August 31, 12
Eventual consistency




Friday, August 31, 12
BASE


 Basically Available,
 Soft-state,
 Eventually consistent


 Opposite to ACID




Friday, August 31, 12
Read your write consistency


     FE1                          FE2
          write         read        write   read
           v2            v2          v1      v1




            v1           v2       v3

                           Data store
Friday, August 31, 12
Session consistency

                               FE
       Session 1                    Session 2
           write        read         write      read
            v2           v2           v1         v1




            v1           v2         v3

                          Data store
Friday, August 31, 12
Monotonic read consistency


     FE1                                  FE2
           read         read   read         read   read
            v2           v2     v3           v3     v4




            v1           v2       v3     v4

                                      Data store
Friday, August 31, 12
Monotonic write consistency


     FE1                           FE2
          write         write        read   read
           v1            v2           v3     v3




            v1          v2           v3     v4

                                      Data store
Friday, August 31, 12
Eventual consistency


     FE1                                         FE2
         read           read   read     read      write
          v1             v2     v2       v3        v3




            v1          v2      v3

                           Data store
Friday, August 31, 12
Hinted handoff


  N: node, G: group including N


  node(N) is unavailable
  
 
 
 replicate to G or
  
 
 
 store data(N) locally
  
 
 
 hint handoff for later
  
 node(N) is alive
  
 
 
 handoff data to node(N)


Friday, August 31, 12
Key = foo, # = N ->                Direct
    handoff hint = true                 replica
                                          fails
      Key = foo
                            N

                            replicate




Friday, August 31, 12
Replica
                        handoff   recovers




Friday, August 31, 12
All
   Key = foo,
   # = N ->                 replicas
   handoff hint =                fail
   true

                        N




Friday, August 31, 12
All
                                    replicas
                        handoff     recover




                        replicate




Friday, August 31, 12
Friday, August 31, 12
Latency is an adjustment screw




Friday, August 31, 12
Availability is an adjustment screw




Friday, August 31, 12
CAP  the variations


  CA  irrelevant

  CP  eventually unavailable offering
  maximum consistency

  AP  eventually inconsistent offering
  maximum availability




Friday, August 31, 12
CAP  the tradeoff




         A                           C




Friday, August 31, 12
Replica 1                          CP

              v1             read

               v2            write
                              v2




                        v2

                        v1   read

  Replica 2
Friday, August 31, 12
Replica 1                   CP (partition)

              v1             read

               v2            write
                              v2




                        v1   read

  Replica 2
Friday, August 31, 12
Replica 1                                 AP

              v1                    write
                                     v2
              v2                    read



                        replicate


              v2              v1    read


  Replica 2
Friday, August 31, 12
Replica 1                        AP (partition)

              v1                  write
                                   v2
              v2                  read
                        hint
                        handoff
                          v2

                          v1      read


  Replica 2
Friday, August 31, 12
Frequent structure changes




Friday, August 31, 12
Thank you




Friday, August 31, 12
Many graphics Ive
                                   created myself

                        Some images originate from
                                 istockphoto.com

                             except few ones taken
                                    from Wikipedia
                                and product pages




Friday, August 31, 12
Ad

Recommended

Digests for the book "Scalability Rules: 50 Principles for Scaling Web Sites"
Digests for the book "Scalability Rules: 50 Principles for Scaling Web Sites"
Cyril Wang
@pavlobaron Why monitoring sucks and how to improve it
@pavlobaron Why monitoring sucks and how to improve it
Pavlo Baron
Why we do tech the way we do tech now (@pavlobaron)
Why we do tech the way we do tech now (@pavlobaron)
Pavlo Baron
Qcon2015 living database
Qcon2015 living database
Pavlo Baron
Becoming reactive without overreacting (@pavlobaron)
Becoming reactive without overreacting (@pavlobaron)
Pavlo Baron
The hidden costs of the parallel world (@pavlobaron)
The hidden costs of the parallel world (@pavlobaron)
Pavlo Baron
data, ..., profit (@pavlobaron)
data, ..., profit (@pavlobaron)
Pavlo Baron
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Pavlo Baron
(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)
Pavlo Baron
Near realtime analytics - technology choice (@pavlobaron)
Near realtime analytics - technology choice (@pavlobaron)
Pavlo Baron
Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)
Pavlo Baron
a Tech guys take on Big Data business cases (@pavlobaron)
a Tech guys take on Big Data business cases (@pavlobaron)
Pavlo Baron
Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)
Pavlo Baron
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Pavlo Baron
From Hand To Mouth (@pavlobaron)
From Hand To Mouth (@pavlobaron)
Pavlo Baron
The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)
Pavlo Baron
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
Pavlo Baron
20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)
Pavlo Baron
NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)
Pavlo Baron
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Pavlo Baron
The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)
Pavlo Baron
Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)
Pavlo Baron
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Pavlo Baron
Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)
Pavlo Baron
JUGS June'11 - Erlang/OTP
JUGS June'11 - Erlang/OTP
Pavlo Baron
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
Pavlo Baron
BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)
Pavlo Baron
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software

More Related Content

More from Pavlo Baron (19)

(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)
Pavlo Baron
Near realtime analytics - technology choice (@pavlobaron)
Near realtime analytics - technology choice (@pavlobaron)
Pavlo Baron
Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)
Pavlo Baron
a Tech guys take on Big Data business cases (@pavlobaron)
a Tech guys take on Big Data business cases (@pavlobaron)
Pavlo Baron
Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)
Pavlo Baron
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Pavlo Baron
From Hand To Mouth (@pavlobaron)
From Hand To Mouth (@pavlobaron)
Pavlo Baron
The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)
Pavlo Baron
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
Pavlo Baron
20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)
Pavlo Baron
NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)
Pavlo Baron
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Pavlo Baron
The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)
Pavlo Baron
Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)
Pavlo Baron
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Pavlo Baron
Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)
Pavlo Baron
JUGS June'11 - Erlang/OTP
JUGS June'11 - Erlang/OTP
Pavlo Baron
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
Pavlo Baron
BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)
Pavlo Baron
(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)
Pavlo Baron
Near realtime analytics - technology choice (@pavlobaron)
Near realtime analytics - technology choice (@pavlobaron)
Pavlo Baron
Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)
Pavlo Baron
a Tech guys take on Big Data business cases (@pavlobaron)
a Tech guys take on Big Data business cases (@pavlobaron)
Pavlo Baron
Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)
Pavlo Baron
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Pavlo Baron
From Hand To Mouth (@pavlobaron)
From Hand To Mouth (@pavlobaron)
Pavlo Baron
The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)
Pavlo Baron
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
Pavlo Baron
20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)
Pavlo Baron
NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)
Pavlo Baron
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Pavlo Baron
The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)
Pavlo Baron
Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)
Pavlo Baron
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Pavlo Baron
Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)
Pavlo Baron
JUGS June'11 - Erlang/OTP
JUGS June'11 - Erlang/OTP
Pavlo Baron
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
Pavlo Baron
BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)
Pavlo Baron

Recently uploaded (20)

ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
Mastering AI Workflows with FME by Mark Doring
Mastering AI Workflows with FME by Mark Doring
Safe Software
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
janeliewang985
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
Mastering AI Workflows with FME by Mark Doring
Mastering AI Workflows with FME by Mark Doring
Safe Software
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
janeliewang985
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
Ad

Dynamo concepts in depth (@pavlobaron)

  • 1. Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, August 31, 12
  • 2. Pavlo Baron pavlo.baron@codecentric.de @pavlobaron Friday, August 31, 12
  • 3. The shopping cart case Friday, August 31, 12
  • 4. The 2 AM alarm call case Friday, August 31, 12
  • 5. The Tower of Babel case Friday, August 31, 12
  • 6. The Neo vs. Smiths case Friday, August 31, 12
  • 7. The Pavlo case Friday, August 31, 12
  • 9. So Dynamo isnt about speed. Its about immediate, reliable writes. Its about operation relaxation. Its about distribution and fault tolerance. Its about almost linear scalability. Friday, August 31, 12
  • 11. Clocks V(i), V(j): competing Con鍖ict resolution: 1: siblings, client 2: merge, system 3: voting, system Friday, August 31, 12
  • 12. Vector clocks Node 1 1,0,0 2,2,0 3,2,0 4,3,3 Node 2 1,1,0 1,2,0 1,3,3 4,4,3 Node 3 1,0,1 1,2,2 1,2,3 4,3,4 Friday, August 31, 12
  • 13. Vector clocks Node 1 Node 2 Node 3 Node 4 1,0,0,0 1,1,0,0 1,2,0,0 1,3,0,3 1,0,1,0 1,0,2,0 1,0,0,1 1,2,0,2 1,2,0,3 Friday, August 31, 12
  • 14. O(1) for data lookups / delta tracking # Friday, August 31, 12
  • 15. Merkle Trees N, M: nodes HT(N), HT(M): hash trees M needs update: obtain HT(N) calc delta(HT(M), HT(N)) pull keys(delta) Friday, August 31, 12
  • 16. Node a.1 Merkle Trees a ab ac abc abd acb acc abe abd ada adb ab ad a Node a.2 Friday, August 31, 12
  • 17. Node a.1 Merkle Trees a ab abc abd abd ada adb ab ad a Node a.2 Friday, August 31, 12
  • 18. Equal nodes based decentralized distribution Friday, August 31, 12
  • 19. Consensus, agreement, voting, quorum Friday, August 31, 12
  • 20. Consistent hashing - the ring X bit integer space 0 <= N <= 2 ^ X or: 2 x Pi 0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A) Friday, August 31, 12
  • 21. Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R+W>V Friday, August 31, 12
  • 22. Insert key Key = foo (sloppy quorum) # = N, W = 2 replicate N ok Friday, August 31, 12
  • 23. Add node co py leave leave co py py leave co Friday, August 31, 12
  • 24. Lookup key (sloppy quorum) N Value = bar Key = foo # = N, R = 2 Friday, August 31, 12
  • 25. Remove node copy leave Friday, August 31, 12
  • 26. Gossip node down/up Node 1 Node 2 update, read, update update 4 down 4 up Node 3 Node 4 update read Friday, August 31, 12
  • 28. BASE Basically Available, Soft-state, Eventually consistent Opposite to ACID Friday, August 31, 12
  • 29. Read your write consistency FE1 FE2 write read write read v2 v2 v1 v1 v1 v2 v3 Data store Friday, August 31, 12
  • 30. Session consistency FE Session 1 Session 2 write read write read v2 v2 v1 v1 v1 v2 v3 Data store Friday, August 31, 12
  • 31. Monotonic read consistency FE1 FE2 read read read read read v2 v2 v3 v3 v4 v1 v2 v3 v4 Data store Friday, August 31, 12
  • 32. Monotonic write consistency FE1 FE2 write write read read v1 v2 v3 v3 v1 v2 v3 v4 Data store Friday, August 31, 12
  • 33. Eventual consistency FE1 FE2 read read read read write v1 v2 v2 v3 v3 v1 v2 v3 Data store Friday, August 31, 12
  • 34. Hinted handoff N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint handoff for later node(N) is alive handoff data to node(N) Friday, August 31, 12
  • 35. Key = foo, # = N -> Direct handoff hint = true replica fails Key = foo N replicate Friday, August 31, 12
  • 36. Replica handoff recovers Friday, August 31, 12
  • 37. All Key = foo, # = N -> replicas handoff hint = fail true N Friday, August 31, 12
  • 38. All replicas handoff recover replicate Friday, August 31, 12
  • 40. Latency is an adjustment screw Friday, August 31, 12
  • 41. Availability is an adjustment screw Friday, August 31, 12
  • 42. CAP the variations CA irrelevant CP eventually unavailable offering maximum consistency AP eventually inconsistent offering maximum availability Friday, August 31, 12
  • 43. CAP the tradeoff A C Friday, August 31, 12
  • 44. Replica 1 CP v1 read v2 write v2 v2 v1 read Replica 2 Friday, August 31, 12
  • 45. Replica 1 CP (partition) v1 read v2 write v2 v1 read Replica 2 Friday, August 31, 12
  • 46. Replica 1 AP v1 write v2 v2 read replicate v2 v1 read Replica 2 Friday, August 31, 12
  • 47. Replica 1 AP (partition) v1 write v2 v2 read hint handoff v2 v1 read Replica 2 Friday, August 31, 12
  • 50. Many graphics Ive created myself Some images originate from istockphoto.com except few ones taken from Wikipedia and product pages Friday, August 31, 12