ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Engineering Velocity: Shifting the Curve at Netflix 
Dianne Marsh (@dmarsh) 
QConNY 2014
en-gi-neer-ing + ve-loc-i-ty 
! 
applying science and technology to designing and building 
speed into a system
6 
5 
Availablity (in 9¡¯s) 0 
4 
3 
2 
1 
0 10 100 1000 
Rate of Change 
Availability vs. Rate of Change
0 10 100 1000 10000 
Shift the Curve 
Availablity (in 9¡¯s) 
6 
5 
4 
3 
2 
1 
0 
Rate of Change
Free the People. Optimize the Tools.
Culture 
Freedom and Responsibility 
http://www.slideshare.net/reed2001/culture-1798664
With Freedom comes Responsibility
Managers¡¯ Role 
Context, not Control 
Loosely coupled, Tightly aligned 
Attract and retain great talent!
Get out of the Way 
Freedom to Innovate
Support Experimentation 
! 
How We Built a Predictive 
Autoscaling Engine 
http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html
Support Independent Paths of Exploration
Build a Blameless Culture
Developers deploy their 
own code 
Rapid 
Innovation 
Detection 
Response
Optimize your Tools
Netflix Build Language 
? Based on Gradle 
? Internal and Open Source 
? Gradle Summit talk: 
http://www.slideshare.net/quidryan/gradle-summit-2014-nebula 
https://github.com/nebula-plugins
Jenkins Job DSL 
Configuration as Code 
Groovy Script 
Scripts go in Version Control 
http://www.slideshare.net/quidryan/configuration-as-code
Aminator 
Create AMI from Base AMI 
Image contains service and everything needed to run it 
Builds Unit of Deployment for Test and Prod 
Abstracts Cloud Details 
http://techblog.netflix.com/2013/03/ami-creation-with-aminator.html
Asgard 
Deploys Netflix to the Cloud 
Red/Black push 
Developed to address delays in rollback 
http://www.infoq.com/presentations/asgard
Red/Black Push 
? Scale up new instances while running the old version 
? Cloud Native 
? Turn on traffic to new ASG 
? Canary Analysis 
? Turn off traffic to old ASG 
? Wait ¡­ Analyze ¡­ Roll Back?
Canary Analysis 
! 
? Production Deployment Pattern 
? Compare Metrics vs. Baseline Version 
? ¡°Canary Analyze All The Things: How we learned to Keep Calm and Release 
Often¡±, Roy Rapoport 
www.slideshare.net/royrapoport/20140612-q-con-canary-analysis
Continuous Delivery Workflow 
Support the Journey 
Judges between Stages 
Represent Best Practices 
http://techblog.netflix.com/2013/09/glisten-groovy-way-to-use-amazons.html
One Click Deployment?
Regional Isolation 
Limit Impact of Human Error 
? Stagger Deployments? 
? Canary Testing per Region? 
Know your Service!
Multi-Region Consistency 
Build Tooling to: 
? Schedule Deployments 
? Prefer Off-Peak 
? Choose Next Available Region 
? Provide Visibility by Region
http://www.infoq.com/presentations/netflix-resiliency-failure-cloud
Chaos Monkey 
Kills Running Instances 
? Simulates failures inherent to 
running in the cloud 
? In Production
Latency Monkey 
Introduces Latency between 
services
Conformity Monkey 
Have Deployments Diverged? 
? Balance Regional Consistency 
with Regional Isolation 
? Build Best Practices into Tooling 
and Reporting
Janitor Monkey 
Reduce Cognitive Load and Cost 
? Remove unused instances 
? Uniform way to clean up
Shifting the Curve with Tools at Netflix 
? Value Self-Service 
? Test Everywhere 
? Awareness of Multiple Regions 
? Best Practices Represented in Tooling 
? Recover Quickly and Easily 
? Be Cloud Native 
? Respect the Journey
Shifting the Curve with Culture at 
Netflix 
? Free the People! 
? Context not Control 
? Freedom to Experiment 
? Blameless Culture
¡°As the number of applications and the scale of the campaign's 
AWS infrastructure use climbed, the DevOps team shifted to 
using Asgard¡ªan open-source tool developed by Netflix to 
manage cloud deployments.¡± 
ArsTechnica, November 2012
Thanks! 
Dianne Marsh (@dmarsh) 
dmarsh@netflix.com

More Related Content

Qconny2014dmarsh 140613080328-phpapp02

  • 1. Engineering Velocity: Shifting the Curve at Netflix Dianne Marsh (@dmarsh) QConNY 2014
  • 2. en-gi-neer-ing + ve-loc-i-ty ! applying science and technology to designing and building speed into a system
  • 3. 6 5 Availablity (in 9¡¯s) 0 4 3 2 1 0 10 100 1000 Rate of Change Availability vs. Rate of Change
  • 4. 0 10 100 1000 10000 Shift the Curve Availablity (in 9¡¯s) 6 5 4 3 2 1 0 Rate of Change
  • 5. Free the People. Optimize the Tools.
  • 6. Culture Freedom and Responsibility http://www.slideshare.net/reed2001/culture-1798664
  • 7. With Freedom comes Responsibility
  • 8. Managers¡¯ Role Context, not Control Loosely coupled, Tightly aligned Attract and retain great talent!
  • 9. Get out of the Way Freedom to Innovate
  • 10. Support Experimentation ! How We Built a Predictive Autoscaling Engine http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html
  • 11. Support Independent Paths of Exploration
  • 12. Build a Blameless Culture
  • 13. Developers deploy their own code Rapid Innovation Detection Response
  • 15. Netflix Build Language ? Based on Gradle ? Internal and Open Source ? Gradle Summit talk: http://www.slideshare.net/quidryan/gradle-summit-2014-nebula https://github.com/nebula-plugins
  • 16. Jenkins Job DSL Configuration as Code Groovy Script Scripts go in Version Control http://www.slideshare.net/quidryan/configuration-as-code
  • 17. Aminator Create AMI from Base AMI Image contains service and everything needed to run it Builds Unit of Deployment for Test and Prod Abstracts Cloud Details http://techblog.netflix.com/2013/03/ami-creation-with-aminator.html
  • 18. Asgard Deploys Netflix to the Cloud Red/Black push Developed to address delays in rollback http://www.infoq.com/presentations/asgard
  • 19. Red/Black Push ? Scale up new instances while running the old version ? Cloud Native ? Turn on traffic to new ASG ? Canary Analysis ? Turn off traffic to old ASG ? Wait ¡­ Analyze ¡­ Roll Back?
  • 20. Canary Analysis ! ? Production Deployment Pattern ? Compare Metrics vs. Baseline Version ? ¡°Canary Analyze All The Things: How we learned to Keep Calm and Release Often¡±, Roy Rapoport www.slideshare.net/royrapoport/20140612-q-con-canary-analysis
  • 21. Continuous Delivery Workflow Support the Journey Judges between Stages Represent Best Practices http://techblog.netflix.com/2013/09/glisten-groovy-way-to-use-amazons.html
  • 23. Regional Isolation Limit Impact of Human Error ? Stagger Deployments? ? Canary Testing per Region? Know your Service!
  • 24. Multi-Region Consistency Build Tooling to: ? Schedule Deployments ? Prefer Off-Peak ? Choose Next Available Region ? Provide Visibility by Region
  • 26. Chaos Monkey Kills Running Instances ? Simulates failures inherent to running in the cloud ? In Production
  • 27. Latency Monkey Introduces Latency between services
  • 28. Conformity Monkey Have Deployments Diverged? ? Balance Regional Consistency with Regional Isolation ? Build Best Practices into Tooling and Reporting
  • 29. Janitor Monkey Reduce Cognitive Load and Cost ? Remove unused instances ? Uniform way to clean up
  • 30. Shifting the Curve with Tools at Netflix ? Value Self-Service ? Test Everywhere ? Awareness of Multiple Regions ? Best Practices Represented in Tooling ? Recover Quickly and Easily ? Be Cloud Native ? Respect the Journey
  • 31. Shifting the Curve with Culture at Netflix ? Free the People! ? Context not Control ? Freedom to Experiment ? Blameless Culture
  • 32. ¡°As the number of applications and the scale of the campaign's AWS infrastructure use climbed, the DevOps team shifted to using Asgard¡ªan open-source tool developed by Netflix to manage cloud deployments.¡± ArsTechnica, November 2012
  • 33. Thanks! Dianne Marsh (@dmarsh) dmarsh@netflix.com