際際滷

際際滷Share a Scribd company logo
Intra-cluster Replication for
        Apache Kafka
           Jun Rao
About myself
 Engineer at LinkedIn since 2010
 Worked on Apache Kafka and Cassandra
 Database researcher at IBM
Outline
   Overview of Kafka
   Kafka architecture
   Kafka replication design
   Performance
   Q/A
Whats Kafka
 A distributed pub/sub messaging system
 Used in many places
   LinkedIn, Twitter, Box, FourSquare 
 What do people use it for?
   log aggregation
   real-time event processing
   monitoring
   queuing
Example Kafka Apps at LinkedIn
Kafka Deployment at LinkedIn
          Live data center                               Offline data center

 Live              Live           Live
service           service        service


                              interactive data
                              (human, machine)

Monitorin
   g

                  Kafka                                   Kafka
                                                           Kafka               Hadoop
                                                                                Hadoop
                   Kafka
                    Kafka                                   Kafka                Hadoop



                 Per day stats
                  writes: 10+ billion messages (2+TB compressed data)
                  reads: 50+ billion messages
Kafka vs. Other Messaging Systems
   Scale-out from groundup
   Persistence to disks
   High throughput (10s MB/sec per server)
   Multi-subscription
Outline
   Overview of Kafka
   Kafka architecture
   Kafka replication design
   Performance
   Q/A
Kafka Architecture
         Producer                            Producer




Broker              Broker   Zookeeper   Broker         Broker




         Consumer                                 Consumer
Terminologies
 Topic = message stream
 Topic has partitions
   partitions distributed to brokers
 Partition has a log on disk
   message persisted in log
   message addressed by offset
API
 Producer
  messages = new List<KeyedMessage<K,V>>();
  messages.add(newKeyedMessage(topic1, null, msg1);
  send(messages);



 Consumer
  streams[] = Consumer.createMessageStream(topic1, 1);

  for(message: streams[0]) {
     // do something with message
  }
Deliver High Throughput
 Simple storage
                    logs in broker
                                                     msg-1
                                                     msg-2
             topic1:part1      topic2:part1          msg-3
                                                     msg-4   index
             segment-1          segment-1            msg-5
                                                      
                                                      
             segment-2          segment-2            msg-n



                                              read()
             segment-n          segment-n
                                              append()



 Batched writes and reads
 Zero-copy transfer from file to socket
 Compression (batched)
Outline
   Overview of Kafka
   Kafka architecture
   Kafka replication design
   Performance
   Q/A
Why Replication
 Broker can go down
   controlled: rolling restart for code/config push
   uncontrolled: isolated broker failure
 If broker down
   some partitions unavailable
   could be permanent data loss
 Replication  higher availability and
  durability
CAP Theorem
 Pick two from
   consistency
   availability
   network partitioning
Kafka Replication: Pick CA
 Brokers within a datacenter
   i.e., network partitioning is rare
 Strong consistency
   replicas byte-wise identical
 Highly available
   typical failover time: < 10ms
Replicas and Layout
 Partition has replicas
 Replicas spread evenly among brokers


       logs              logs           logs           logs

    topic1-part1      topic1-part2   topic2-part1   topic2-part2

    topic2-part2      topic1-part1   topic1-part2   topic2-part1

    topic2-part1      topic2-part2   topic1-part1   topic1-part2

      broker 1          broker 2       broker 3        broker 4
Maintain Strongly Consistent Replicas
   One of the replicas is leader
   All writes go to leader
   Leader propagates writes to followers in order
   Leader decides when to commit message
Conventional Quorum-based Commit
 Wait for majority of replicas (e.g. Zookeeper)
 Plus: good latency
 Minus: 2f+1 replicas  tolerate f failures
   ideally want to tolerate 2f failures
Commit Messages in Kafka
 Leader maintains in-sync-replicas (ISR)
   initially, all replicas in ISR
   message committed if received by ISR
   follower fails  dropped from ISR
   leader commits using new ISR
 Benefit: f replicas  tolerate f-1 failures
   latency less an issue within datacenter
Data Flow in Replication
     producer
                                                   2
     ack            1
                                    2
                leader                  follower                    follower
                3

                commit
           4
                    topic1-part1           topic1-part1                topic1-part1
consumer

                         broker 1              broker 2                    broker 3




           When producer receives ack Latency                                     Durabilityon failures
           no ack                                      no network delay           some data loss
           wait for leader                             1 network roundtrip        a few data loss
           wait for committed                          2 network roundtrips no data loss

           Only committed messages exposed to consumers
                 independent of ack type chosen by producer
Extend to Multiple Partitions
producer




    leader                           follower             follower
           topic1-part1                  topic1-part1         topic1-part1
                          producer



                                     leader              follower              follower              producer
                                         topic2-part1        topic2-part1          topic2-part1



                                     follower            follower              leader
                                         topic3-part1         topic3-part1          topic3-part1

             broker 1                         broker 2              broker 3              broker 4




 Leaders are evenly spread among brokers
Handling Follower Failures
 Leader maintains last committed offset
   propagated to followers
   checkpointed to disk
 When follower restarts
   truncate log to last committed
   fetch data from leader
   fully caught up  added to ISR
Handling Leader Failure
 Use an embedded controller (inspired by Helix)
   detect broker failure via Zookeeper
   on leader failure: elect new leader from ISR
   committed messages not lost
 Leader and ISR written to Zookeeper
   for controller failover
   expected to change infrequently
Example of Replica Recovery
1. ISR = {A,B,C}; Leader A commits message m1;
                  L (A)   F (B)        F (C)
                  m1       m1           m1
last committed             m2
                  m2
                  m3

2. A fails and B is new leader; ISR = {B,C}; B commits m2, but not m3
                  L (A)   L (B)        F (C)
                  m1       m1           m1
                  m2       m2           m2
                  m3

3. B commits new messages m4, m5
                  L (A)   L (B)        F (C)
                  m1       m1           m1
                  m2       m2           m2
                  m3       m4           m4
                           m5           m5


4. A comes back, truncates to m1 and catches up; finally ISR = {A,B,C}
                  F (A)   L (B)        F (C)                 F (A)       L (B)   F (C)
                  m1       m1           m1                   m1           m1      m1
                           m2           m2                   m2           m2      m2
                           m4           m4                   m4           m4      m4
                           m5           m5                   m5           m5      m5
Outline
   Overview of Kafka
   Kafka architecture
   Kafka replication design
   Performance
   Q/A
Setup
   3 brokers
   1 topic with 1 partition
   Replication factor=3
   Message size = 1KB
Choosing btw Latency and Durability


    When producer      Time to publish Durabilityon
    receives ack       a message (ms) failures
    no ack             0.29            some data loss
    wait for leader    1.05            a few data loss
    wait for committed 2.05            no data loss
Producer Throughput

            varying messages per send                               varying # concurrent producers
       70                                                      70
       60                                                      60
       50                                                      50
MB/s




                                                        MB/s
       40                                                      40
                                            no ack                                              no ack
       30                                                      30
       20                                   leader             20                               leader
       10                                   committed          10                               committed
        0                                                       0
             1     10       100      1000                              1     5        10   20
                 messages per send                                           # producers
Consumer Throughput

               throughput vs fetch size
        100

         80

         60
 MB/s




         40

         20

          0
              1KB      10KB                100KB   1MB
                              fetch size
Q/A
 Kafka 0.8.0 (intra-cluster replication)
   expected to be released in Mar
   various performance improvements in the future
 Checkout more about Kafka
   http://kafka.apache.org/
 Kafka meetup tonight

More Related Content

What's hot (20)

PDF
Kafka 101 and Developer Best Practices
confluent
PPTX
Introduction to Storm
Chandler Huang
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
PPTX
Kafka 101
Clement Demonchy
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
PPTX
Introduction to Apache Kafka
Jeff Holoman
PPTX
Introduction to Apache Kafka
AIMDek Technologies
PPTX
Apache kafka
Viswanath J
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
PDF
ksqlDB: A Stream-Relational Database System
confluent
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
PDF
Parquet performance tuning: the missing guide
Ryan Blue
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
PPTX
kafka
Amikam Snir
PDF
Introduction to Kafka Streams
Guozhang Wang
PDF
A Deep Dive into Kafka Controller
confluent
PDF
Introduction to Redis
Dvir Volk
PPTX
Kafka at Peak Performance
Todd Palino
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
Kafka 101 and Developer Best Practices
confluent
Introduction to Storm
Chandler Huang
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
Kafka 101
Clement Demonchy
Fundamentals of Apache Kafka
Chhavi Parasher
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
Introduction to Apache Kafka
Jeff Holoman
Introduction to Apache Kafka
AIMDek Technologies
Apache kafka
Viswanath J
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
ksqlDB: A Stream-Relational Database System
confluent
Flink powered stream processing platform at Pinterest
Flink Forward
Parquet performance tuning: the missing guide
Ryan Blue
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
kafka
Amikam Snir
Introduction to Kafka Streams
Guozhang Wang
A Deep Dive into Kafka Controller
confluent
Introduction to Redis
Dvir Volk
Kafka at Peak Performance
Todd Palino
Apache Kafka Architecture & Fundamentals Explained
confluent

Similar to Kafka replication apachecon_2013 (20)

PDF
Kafka 10000 feet view
younessx01
PPTX
Introduction to Kafka
Ducas Francis
PPTX
Kafka101
Aparna Pillai
PDF
RabbitMQ vs Apache Kafka - Part 1
Erlang Solutions
PDF
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Chen-en Lu
PPTX
Kafka overview v0.1
Mahendran Ponnusamy
PPTX
kafka_session_updated.pptx
Koiuyt1
PDF
Zab dsn-2011
Flavio Junqueira
PPTX
Kafka.pptx
Tarun techme
PPTX
Kafka 101
Aparna Pillai
PDF
Ultra-scalable Architectures for Telecommunications and Web 2.0 Services
Mauricio Arango
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
PDF
Kafka Overview
iamtodor
PDF
An Introduction to Apache Kafka
Amir Sedighi
PPTX
Kafka and ibm event streams basics
Brian S. Paskin
PPTX
Distributed messaging through Kafka
Dileep Kalidindi
PPT
Apache kafka- Onkar Kadam
Onkar Kadam
PDF
Building a Distributed Message Log from Scratch
Tyler Treat
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
Kafka 10000 feet view
younessx01
Introduction to Kafka
Ducas Francis
Kafka101
Aparna Pillai
RabbitMQ vs Apache Kafka - Part 1
Erlang Solutions
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Chen-en Lu
Kafka overview v0.1
Mahendran Ponnusamy
kafka_session_updated.pptx
Koiuyt1
Zab dsn-2011
Flavio Junqueira
Kafka.pptx
Tarun techme
Kafka 101
Aparna Pillai
Ultra-scalable Architectures for Telecommunications and Web 2.0 Services
Mauricio Arango
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
Kafka Overview
iamtodor
An Introduction to Apache Kafka
Amir Sedighi
Kafka and ibm event streams basics
Brian S. Paskin
Distributed messaging through Kafka
Dileep Kalidindi
Apache kafka- Onkar Kadam
Onkar Kadam
Building a Distributed Message Log from Scratch
Tyler Treat
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
Ad

Kafka replication apachecon_2013

  • 1. Intra-cluster Replication for Apache Kafka Jun Rao
  • 2. About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM
  • 3. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A
  • 4. Whats Kafka A distributed pub/sub messaging system Used in many places LinkedIn, Twitter, Box, FourSquare What do people use it for? log aggregation real-time event processing monitoring queuing
  • 5. Example Kafka Apps at LinkedIn
  • 6. Kafka Deployment at LinkedIn Live data center Offline data center Live Live Live service service service interactive data (human, machine) Monitorin g Kafka Kafka Kafka Hadoop Hadoop Kafka Kafka Kafka Hadoop Per day stats writes: 10+ billion messages (2+TB compressed data) reads: 50+ billion messages
  • 7. Kafka vs. Other Messaging Systems Scale-out from groundup Persistence to disks High throughput (10s MB/sec per server) Multi-subscription
  • 8. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A
  • 9. Kafka Architecture Producer Producer Broker Broker Zookeeper Broker Broker Consumer Consumer
  • 10. Terminologies Topic = message stream Topic has partitions partitions distributed to brokers Partition has a log on disk message persisted in log message addressed by offset
  • 11. API Producer messages = new List<KeyedMessage<K,V>>(); messages.add(newKeyedMessage(topic1, null, msg1); send(messages); Consumer streams[] = Consumer.createMessageStream(topic1, 1); for(message: streams[0]) { // do something with message }
  • 12. Deliver High Throughput Simple storage logs in broker msg-1 msg-2 topic1:part1 topic2:part1 msg-3 msg-4 index segment-1 segment-1 msg-5 segment-2 segment-2 msg-n read() segment-n segment-n append() Batched writes and reads Zero-copy transfer from file to socket Compression (batched)
  • 13. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A
  • 14. Why Replication Broker can go down controlled: rolling restart for code/config push uncontrolled: isolated broker failure If broker down some partitions unavailable could be permanent data loss Replication higher availability and durability
  • 15. CAP Theorem Pick two from consistency availability network partitioning
  • 16. Kafka Replication: Pick CA Brokers within a datacenter i.e., network partitioning is rare Strong consistency replicas byte-wise identical Highly available typical failover time: < 10ms
  • 17. Replicas and Layout Partition has replicas Replicas spread evenly among brokers logs logs logs logs topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part1 topic2-part2 topic1-part1 topic1-part2 broker 1 broker 2 broker 3 broker 4
  • 18. Maintain Strongly Consistent Replicas One of the replicas is leader All writes go to leader Leader propagates writes to followers in order Leader decides when to commit message
  • 19. Conventional Quorum-based Commit Wait for majority of replicas (e.g. Zookeeper) Plus: good latency Minus: 2f+1 replicas tolerate f failures ideally want to tolerate 2f failures
  • 20. Commit Messages in Kafka Leader maintains in-sync-replicas (ISR) initially, all replicas in ISR message committed if received by ISR follower fails dropped from ISR leader commits using new ISR Benefit: f replicas tolerate f-1 failures latency less an issue within datacenter
  • 21. Data Flow in Replication producer 2 ack 1 2 leader follower follower 3 commit 4 topic1-part1 topic1-part1 topic1-part1 consumer broker 1 broker 2 broker 3 When producer receives ack Latency Durabilityon failures no ack no network delay some data loss wait for leader 1 network roundtrip a few data loss wait for committed 2 network roundtrips no data loss Only committed messages exposed to consumers independent of ack type chosen by producer
  • 22. Extend to Multiple Partitions producer leader follower follower topic1-part1 topic1-part1 topic1-part1 producer leader follower follower producer topic2-part1 topic2-part1 topic2-part1 follower follower leader topic3-part1 topic3-part1 topic3-part1 broker 1 broker 2 broker 3 broker 4 Leaders are evenly spread among brokers
  • 23. Handling Follower Failures Leader maintains last committed offset propagated to followers checkpointed to disk When follower restarts truncate log to last committed fetch data from leader fully caught up added to ISR
  • 24. Handling Leader Failure Use an embedded controller (inspired by Helix) detect broker failure via Zookeeper on leader failure: elect new leader from ISR committed messages not lost Leader and ISR written to Zookeeper for controller failover expected to change infrequently
  • 25. Example of Replica Recovery 1. ISR = {A,B,C}; Leader A commits message m1; L (A) F (B) F (C) m1 m1 m1 last committed m2 m2 m3 2. A fails and B is new leader; ISR = {B,C}; B commits m2, but not m3 L (A) L (B) F (C) m1 m1 m1 m2 m2 m2 m3 3. B commits new messages m4, m5 L (A) L (B) F (C) m1 m1 m1 m2 m2 m2 m3 m4 m4 m5 m5 4. A comes back, truncates to m1 and catches up; finally ISR = {A,B,C} F (A) L (B) F (C) F (A) L (B) F (C) m1 m1 m1 m1 m1 m1 m2 m2 m2 m2 m2 m4 m4 m4 m4 m4 m5 m5 m5 m5 m5
  • 26. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A
  • 27. Setup 3 brokers 1 topic with 1 partition Replication factor=3 Message size = 1KB
  • 28. Choosing btw Latency and Durability When producer Time to publish Durabilityon receives ack a message (ms) failures no ack 0.29 some data loss wait for leader 1.05 a few data loss wait for committed 2.05 no data loss
  • 29. Producer Throughput varying messages per send varying # concurrent producers 70 70 60 60 50 50 MB/s MB/s 40 40 no ack no ack 30 30 20 leader 20 leader 10 committed 10 committed 0 0 1 10 100 1000 1 5 10 20 messages per send # producers
  • 30. Consumer Throughput throughput vs fetch size 100 80 60 MB/s 40 20 0 1KB 10KB 100KB 1MB fetch size
  • 31. Q/A Kafka 0.8.0 (intra-cluster replication) expected to be released in Mar various performance improvements in the future Checkout more about Kafka http://kafka.apache.org/ Kafka meetup tonight