際際滷

際際滷Share a Scribd company logo
Storm at
Rob Doherty
Senior Backend Engineer
rdoherty@outbrain.com
@robdoherty2
What is Outbrain?
Before Storm
 Custom distributed processing system
 Python and ZMQ
 Advantages:
 Simple components
 Well-understood
 Disadvantages:
 Did not scale
 Batch-processing
Kafka + Storm
 Kafka: high-throughput distributed messaging
 Storm: distributed, real-time computation
Why Kafka?
 Need method to buffer clicks into stream
 Kafka + Storm common pattern for click tracking
Why Storm?
 Real time (15s latency requirements)
 Fault tolerance
 Easy to manage parallelism
 Stream grouping
 Active community
 Open-source project
Nginx Servers
Kafka Cluster
Storm Topology
Elastic Load Balancer
Customer
Traffic
AWS
MongoDB
Redis
Algo
API
Architecture
Kafka Cluster
 40 Producers (8 m1.large instances)
 Python brod
 4 Brokers (4 m1.large instances)
10k Clicks per second (peak)
14B Clicks per month
Kafka v0.7.2
Storm Topology
 40 Supervisors (c1.xlarge instances)
 35 Bolts, 1 Kafka spout
 250+ Executors (worker threads)
160k+ tuples executed per second
Storm v0.82
Leiningen v1.7
Customer
Traffic
Kafka Spout Aggregate 15s
Aggregate 5m
Position
Customer
Social
Arrangement
Front Page
@Handle
Storm Topology
Challenges
 Shell Bolts
 Anchor Bolts/Replaying Stream
 Acking Tuples
 Monitoring
Monitoring
 Scribe Logging
 Munin + Nagios
 JMX-JMXTrans + Ganglia
 Storm UI
 Thrift interface into Nimbus + D3
Nyc storm meetup_robdoherty
Nyc storm meetup_robdoherty
Monitoring
 Scribe Logging
 Munin + Nagios
 JMX-JMXTrans + Ganglia
 Storm UI
 Thrift interface into Nimbus + D3
Nyc storm meetup_robdoherty
Nyc storm meetup_robdoherty
Future Plans
 Load testing
 Break topology into smaller pieces
 Move from AWS to private data center
Thank you
Rob Doherty
Senior Backend Engineer
rdoherty@outbrain.com
@robdoherty2

More Related Content

Nyc storm meetup_robdoherty