際際滷

際際滷Share a Scribd company logo
LEARNING TO BUILD
DISTRIBUTED SYSTEMS
THE HARD WAY
@iconara
LEARNING TO BUILD
DISTRIBUTED SYSTEMS
THE HARD WAY
BIG DATA
@iconara
speakerdeck.com/u/iconara
(real time!)
Theo / @iconara
chief architect at BURT
lets make online advertising a great experience
Learning to build distributed systems the hard way
MAKING THIS
INTO THIS
HOWHARDCANITBE?
Learning to build distributed systems the hard way
30K REQUESTS
PER SECOND
more than a billion requests per day,
over 1 TB raw data
ONE VISIT CAN
CHANGE UP TO
100K COUNTERS
hundreds of millions of individual counters per day,
plus counting uniques and visitor histories
IN REAL TIME
or near real time, if you want to be pedantic
HOWHARDCANITBE?
START WITH TWO
OF EVERYTHING
going from one to two is the hardest,
solve the scaling problem up front
START WITH TWO
OF EVERYTHING
youll solve the scaling problem,
and need less overcapacity
THREE
GIVE A LOT OF
THOUGHT TO
KEYS AND IDS
and think about your queries 鍖rst
MEIHO0 JME57Z
monotonically increasing,
sorts nicely
a timestamp
something random
JME57Z MEIHO0
uniformly distributed,
works nicely with sharding
something random
a timestamp
CONSISTENCY IS
OVERRATED
dont fear R + W < N
PRECOMPUTE
ALL THE THINGS
your users most likely dont know what they want,
so why let them do ad hoc queries?
SEPARATE
PROCESSING
FROM STORAGE
that way you can scale each independently
PLAN HOW TO GET
RID OF YOUR DATA
deleting stuff is harder than you might think
NoDB
keep things streaming
DIVIDE THE LOAD
big data systems are all about
routing and partitioning
RANDOM
when you have no interdependencies
between things its easy to scale out
CONSISTENT
when there are interdependencies you need
to route using some property of the objects,
but make sure you get a uniform distribution
NUMEROLOGY
12
2 | 12
3 | 12
4 | 12
6 | 12
8 | 24
5 | 60
A DIVERSION ABOUT
COUNTING TO 60
the reason why theres 60 seconds to a minute,
and 360 degrees to a circle
3 SEGMENTS
ON EACH FINGER
= 12
3 SEGMENTS
ON EACH FINGER
= 12
FIVE FINGERS
ON OTHER HAND
= 60
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
use multiples of 12 to scale
without always having to double
BLAH BLAH BLAH
use multiples of 12 to scale
without always having to double
log2(366)  31
$-$
(ASCII code 36)-----
log2(366)  31
log2(366)  31
six characters 0-9, A-Z can represent 31 bits,
which is kind of almost very close to four bytes
MEIHO0
MEIHO0
a timestamp
Time.now.to_i.to_s(36).upcase
Learning to build distributed systems the hard way
YOU CANT SCALE
TO REAL TIME
and dont trust code that doesnt run continuously
DO YOU REALLY
NEED A BACKUP?
if you got 3x replication over multiple
availability zones, is that backup really worth it?
PRODUCTION IS THE
ONLY REAL TEST
ENVIRONMENT
when thousands of things happen every second,
new, weird and unforeseen things happen all the time,
your tests can only cover the foreseeable
=
GTEBORG,
DISTRIBUTED
@gbgdistr
KTHXBAI
@iconara
github.com/iconara
architecturalatrocities.com
burtcorp.com

More Related Content

Learning to build distributed systems the hard way