The document discusses how to handle cloud failures when using Amazon AWS. It recommends:
1) Designing infrastructure to be redundant across multiple availability zones and regions to prevent outages from single failures.
2) Decoupling applications so that they are stateless and can degrade gracefully or auto-recover from failures.
3) Automating infrastructure, monitoring, scaling and recovery processes to minimize downtime from failures.
4) Continuously testing infrastructure resilience through failure simulation tools like ChaosMonkey.
5) Weighing costs and risks of different redundancy and automation strategies for an organization's specific needs.
1 of 16
Download to read offline
More Related Content
How to handle cloud failure.
1. How to handle
cloud failure
Grzegorz Kochan | grzegorz@adtaily.com 1
2. subtitle text
Jak sobie radzi?
z awari? w chmurach
Grzegorz Kochan | grzegorz@adtaily.com 2
3. AdTaily
on Amazon AWS
1,5 bilion widget pageviews monthly
100 000 ad clicks daily
35 thousand registered publishers
15 thousand advertisers
over 1500 requests per second
over 150 mbit data per second
Grzegorz Kochan | grzegorz@adtaily.com 3
4. Startup in the cloud
why?
Availability Pricing
API
Scalability Simplicity
Grzegorz Kochan | grzegorz@adtaily.com 4
5. But things
can go wrong
TechCrunch
¡°Amazon EC2 goes down, taking with it Reddit,
Foursquare and Quora¡± - April 2011
?Down Goes The Internet¡ Again. Amazon EC2
Outage Takes Down Foursquare, Instagram,
Quora, Reddit, Etc¡± - August 2011
Grzegorz Kochan | grzegorz@adtaily.com 5
7. Amazon AWS
Geographical map
US - Oregon EU Ireland Asia Paci?c: Tokyo
Availability Zones: Availability Zones: Availability Zones:
us-west-1a eu-west-1b ap-southeast-1a
us-west-1b eu-west-1b ap-southeast-1b
us-west-1c eu-west-1c
US - N. California US - Virginia Asia Paci?c:
Availability Zones: Availability Zones: Singapore
us-west-2a Availability Zones:
us-east-1a
us-west-2b ap-southeast-1a
us-east-1b
ap-southeast-1b
us-east-1c
us-east-1d
Grzegorz Kochan | grzegorz@adtaily.com 7
8. Replicate,
duplicate and balance
Multi server Availability Zone
Single server setup us-east-1a
setup
Grzegorz Kochan | grzegorz@adtaily.com 8
9. Distribute
multi A-Zone architecture
Availability Zone Availability Zone
us-east-1a us-east-1b
Grzegorz Kochan | grzegorz@adtaily.com 9
10. Distribute more
multi Region architecture
US East Region US West Region
Grzegorz Kochan | grzegorz@adtaily.com 10
11. Design for failure
application decoupling
Stateless services
Gracefull degradation
Die fast and alone
Auto recover
Backup and scale independently
Grzegorz Kochan | grzegorz@adtaily.com 11