The document discusses Netflix's API architecture and how it achieves fault tolerance and high performance. It describes how the API is composed of dozens of dependencies that could each fail independently. It then outlines Netflix's approaches to avoid any single dependency taking down the entire application, including using fallbacks, failing silently, failing fast, shedding load, aggressive timeouts, semaphores, separate threads, and circuit breakers. It provides examples of how over 10 billion dependency commands can be executed per day with over 1 billion incoming requests.
1 of 42
More Related Content
Performance and Fault Tolerance for the Netflix API
1. Performance and Fault Tolerance
for the Net?ix API
Ben Christensen
Software Engineer ¨C API Platform at Net?ix
@benjchristensen
http://www.linkedin.com/in/benjchristensen
http://techblog.net?ix.com/
2. Net?ix API
Dependency A Dependency B Dependency C
Dependency D Dependency E Dependency F
Dependency G Dependency H Dependency I
Dependency J Dependency K Dependency L
Dependency M Dependency N Dependency O
Dependency P Dependency Q Dependency R
3. Net?ix API
Dependency A Dependency B Dependency C
Dependency D Dependency E Dependency F
Dependency G Dependency H Dependency I
Dependency J Dependency K Dependency L
Dependency M Dependency N Dependency O
Dependency P Dependency Q Dependency R
4. Dozens of dependencies.
One going bad takes everything down.
99.99%30 = 99.7% uptime
0.3% of 1 billion = 3,000,000 failures
2+ hours downtime/month
even if all dependencies have excellent uptime.
Reality is generally worse.
8. No single dependency should
take down the entire app.
Fallback.
Fail silent.
Fail fast.
Shed load.
11. Tryable semaphores for ¡°trusted¡± clients and fallbacks
Separate threads for ¡°untrusted¡± clients
Aggressive timeouts on threads and network calls
to ¡°give up and move on¡±
Circuit breakers as the ¡°release valve¡±
16. 30 rps x 0.2 seconds = 6 + breathing room = 10 threads
Thread-pool Queue size: 5-10 (0 doesn't work but get close to it)
Thread-pool Size + Queue Size
Queuing is Not Free
17. Cost of Thread @ 75rps
median - 90th - 99th (time in ms)
Time for thread to execute Time user thread waited
28. Net?ix API
Dependency A Dependency B Dependency C
Dependency D Dependency E Dependency F
Dependency G Dependency H Dependency I
Dependency J Dependency K Dependency L
Dependency M Dependency N Dependency O
Dependency P Dependency Q Dependency R
29. Single Network Request from Clients
(use LAN instead of WAN)
Send Only The Bytes That Matter
(optimize responses for each client)
Leverage Concurrency
(but abstract away its complexity)
30. Single Network Request from Clients
(use LAN instead of WAN)
Device
Server
Net?ix API
landing page requires
~dozen API requests
31. Single Network Request from Clients
(use LAN instead of WAN)
some clients are limited in the number of
concurrent network connections
32. Single Network Request from Clients
(use LAN instead of WAN)
network latency makes this even worse
(mobile, home, wi?, geographic distance, etc)
33. Single Network Request from Clients
(use LAN instead of WAN)
Device
Server
Net?ix API
push call pattern to server ...
34. Single Network Request from Clients
(use LAN instead of WAN)
Device
Server
Net?ix API
... and eliminate redundant calls
35. Send Only The Bytes That Matter
(optimize responses for each client)
Net?ix API
Device
Server
Client Client
part of client now on server
36. Send Only The Bytes That Matter
(optimize responses for each client)
Net?ix API
Device
Server
Client Client
client retrieves and delivers exactly what their
device needs in its optimal format
37. Send Only The Bytes That Matter
(optimize responses for each client)
Device
Server
Net?ix API
Service Layer
Client Client
interface is now a Java API that client
interacts with at a granular level
38. Leverage Concurrency
(but abstract away its complexity)
Device
Server
Net?ix API
Service Layer
Client Client
39. Leverage Concurrency
(but abstract away its complexity)
Device
Server
Net?ix API
Service Layer
Client Client
no synchronized, volatile, locks, Futures or
Atomic*/Concurrent* classes in client-server code
40. Leverage Concurrency
(but abstract away its complexity)
Service calls are def video1Call = api.getVideos(api.getUser(), 123456, 7891234);
all asynchronous def video2Call = api.getVideos(api.getUser(), 6789543);
// higher-order functions used to compose asynchronous calls together
wx.merge(video1Call, video2Call).toList().subscribe([
Functional
onNext: {
programming listOfVideos ->
with higher-order for(video in listOfVideos) {
functions response.getWriter().println("video: " + video.id + " " + video.title);
}
},
onError: {
exception ->
response.setStatus(500);
response.getWriter().println("Error: " + exception.getMessage());
}
])
Fully asynchronous API - Clients can¡¯t block
41. Device
Server
Net?ix API
Optimize for each device. Leverage the server.
42. Net?ix is Hiring
http://jobs.net?ix.com
Fault Tolerance in a High Volume, Distributed System
http://techblog.net?ix.com/2012/02/fault-tolerance-in-high-volume.html
Making the Net?ix API More Resilient
http://techblog.net?ix.com/2011/12/making-net?ix-api-more-resilient.html
Why REST Keeps Me Up At Night
http://blog.programmableweb.com/2012/05/15/why-rest-keeps-me-up-at-night/
Ben Christensen
@benjchristensen
http://www.linkedin.com/in/benjchristensen
Editor's Notes
\n
The Netflix API serves all streaming devices and acts as the broker between backend Netflix systems and the user interfaces running on the 800+ devices that support Netflix streaming. \n\nMore than 1 billion incoming calls per day are received which in turn fans out to several billion outgoing calls (averaging a ratio of 1:7) to dozens of underlying subsystems with peaks of over 200k dependency requests per second. \n
First half of the presentation discusses resilience engineering implemented to handle failure and latency at the integration points with the various dependencies. \n
Even when all dependencies are performing well the aggregate impact of even 0.01% downtime on each of dozens of services equates to potentially hours a month of downtime if not engineered for resilience. \n
\n
\n
\n
It is a requirement of high volume, high availability applications to build fault and latency tolerance into their architecture and not expect infrastructure to solve it for them. \n
\n
\n
\n
\n
\n
\n
\n
\n
Sample of 1 dependency circuit for 12 hours from production cluster with a rate of 75rps on a single server. \n\nEach execution occurs in a separate thread with median, 90th and 99th percentile latencies shown in the first 3 legend values. \n\nThe calling thread median, 90th and 99th percentiles are the last 3 legend values. \n\nThus, the median cost of the thread is 1.62ms - 1.57ms = 0.05ms, at the 90th it is 4.57-2.05 = 2.52ms. \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Second half of the presentation discusses architectural changes to enable optimizing the API for each Netflix device as opposed to a generic one-size-fits-all API which treats all devices the same. \n
Netflix has over 800 unique devices that fall into several dozens classes with unique user experiences, different calling patterns, capabilities and needs from the data and thus the API. \n
The one-size-fits-all API results in chatty clients, some requiring ~dozen requests to render a page. \n
\n
\n
The client should make a single request and push the 'chatty' part to the server where low-latency networks and multi-core servers can perform the work far more efficiently. \n
\n
The client now extends over the network barrier and runs a portion in the server itself. The client sends requests over HTTP to its other half running in the server which then can access a Java API at a very granular level to access exactly what it needs and return an optimized response suited to the devices exact requirements and user experience. \n
The client now extends over the network barrier and runs a portion in the server itself. The client sends requests over HTTP to its other half running in the server which then can access a Java API at a very granular level to access exactly what it needs and return an optimized response suited to the devices exact requirements and user experience. \n
The client now extends over the network barrier and runs a portion in the server itself. The client sends requests over HTTP to its other half running in the server which then can access a Java API at a very granular level to access exactly what it needs and return an optimized response suited to the devices exact requirements and user experience. \n
Concurrency is abstracted away behind an asynchronous API and data is retrieved, transformed and composed using high-order-functions (such as map, mapMany, merge, zip, take, toList, etc). Groovy is used for its closure support that lends itself well to the functional programming style. \n
Concurrency is abstracted away behind an asynchronous API and data is retrieved, transformed and composed using high-order-functions (such as map, mapMany, merge, zip, take, toList, etc). Groovy is used for its closure support that lends itself well to the functional programming style. \n
Concurrency is abstracted away behind an asynchronous API and data is retrieved, transformed and composed using high-order-functions (such as map, mapMany, merge, zip, take, toList, etc). Groovy is used for its closure support that lends itself well to the functional programming style. \n
The Netflix API is becoming a platform that empowers user-interface teams to build their own API endpoints that are optimized to their client applications and devices.\n