This document summarizes the technical challenges and solutions for powering the buzzer system for Britain's Got Talent, which needed to handle 10,000 buzzes per second during live shows. It describes testing the system using Tsung, adding an Elastic Load Balancer and HAProxy for scaling, using Memcached and Cassandra to avoid overloading the database, and automating configuration of 100+ servers with Chef. The system ended with a scalable architecture using load balancers, Memcached, Cassandra, and Chef configuration to meet the performance needs for the live talent show.
1 of 17
Downloaded 18 times
More Related Content
Scaling the Britain's Got Talent Buzzer
1. 1
Powering the Britains Got Talent
buzzer*
*And Big Data
Big Data Meetup, London 25/5/2011
Thursday, 26 May 2011 1
5. 5
The challenge
10 Million+ viewers
Design goal of 50,000 requests/s, 10,000 buzzes/second
Equivalent to 130 Billion requests/month
But just on Saturday night
And four weeks to build
Thursday, 26 May 2011 5
6. 6
The challenge
Where does 130
Billion requests 鍖t?
Source: http://www.google.com/adplanner/static/top1000/#
Thursday, 26 May 2011 6
7. 7
Where we started....
app.livetalkback.com cdn.livetalkback.com
Control plane
ELB CloudFront
Zabbix
Webserver Webserver
Django Django
Ubuntu Ubuntu
MySQL S3
Thursday, 26 May 2011 7
8. 8
Step 1: Testing
Started with a platform with a previous peak of 100 requests/s
No idea where it would break
Tsung! http://tsung.erlang-projects.org/
Thursday, 26 May 2011 8
9. 9
Step 2: ELB
Amazon Elastic Load Balancer
In鍖nite capacity
BUT very long impulse response and NO controls :(
HAProxy to the rescue
5K requests/s per node
Thursday, 26 May 2011 9
10. 10
Step 3: Avoid the DB
MySQL was never going to be able to handle 10,000 writes/s, nor 50,000
reads
Hey, Django does memcached. Problem solved
Help, our memcached server I/O is maxed out :(
Two-layer cache: https://gist.github.com/953524
Write-behind data
Thursday, 26 May 2011 10
11. 11
But we want analytics!
Now 10K things to write to disk every second
Logging? Database?
This is starting to look like BIG DATA
Thursday, 26 May 2011 11
13. 13
Step 5: Cassandra
Deployed Cassandra cluster on EC2 to handle buzz records
Tested to > 10K writes/s
All good!
So how many users did we have last night?
Thursday, 26 May 2011 13
14. 14
Where we ended...
app.livetalkback.com cdn.livetalkback.com
10
Control plane
HAProxy HAProxy CloudFront nodes
Chef
Webserver Webserver 100+
nodes
Django Django
Ubuntu Ubuntu
Zabbix
Memcached Cassandra
Memcached Cassandra RDS Master S3
Thursday, 26 May 2011 14
15. 15
Scaling up - and down
Con鍖guring 100+ servers by
hand each week would have
been a pain
Used to Chef to automate
Also builds the test swarm
http://wiki.opscode.com/display/
chef/Home
Thursday, 26 May 2011 15
16. 16
Now what?
Still challenges with analytics & ad-hoc queries
Looking at Brisk and Hadoop
Were sucking the Twitter 鍖rehose for Tellybug
MySQL is coping so far, but only just
Thursday, 26 May 2011 16
17. 17
Questions?
boxm@livetalkback.com
@malcolmbox
Thursday, 26 May 2011 17