際際滷

際際滷Share a Scribd company logo
How to handle incidents, downtime & outages
Devopsdays, Amsterdam 2015
David Mytton, Founder, Server Density
Handling incidents
Cost of uptime?
Cost of uptime?
Cost of uptime?
$2.9bn
Q1: 2015
Cost of uptime?
Cost of uptime?
$2.9bn
Q1: 2015
$870m
Q1: 2015
Cost of uptime?
Cost of uptime?
$2.9bn
Q1: 2015
$870m
Q1: 2015
$4.1bn
Q1: 2015
Cost of uptime?
How much are you spending?
Expect downtime
 Prepare
 Respond
 Postmortem
Prepare
 On call
 Primary/secondary
Prepare
 On call
 Primary/secondary
 Reachability
Prepare
 On call
 Off call
Prepare
 On call
 Off call
 Docs
Prepare
 On call
 Off call
 Docs
 Searchable
Prepare
 On call
 Off call
 Docs
 Searchable
 Independent
Prepare
 Key info
 Team contacts
Prepare
 Key info
 Team contacts
 Vendor contacts
Prepare
 Key info
 Team contacts
 Vendor contacts
 Key credentials
Prepare
 Key info
 Unexpected situations
Prepare
 Communication
 Key info
 Unexpected situations
Prepare
 Communication
 Internet access
 Key info
 Unexpected situations
 Communication
 Internet access
 Support access
Prepare
Respond
 First responder
1. Load incident response checklist
Respond
 First responder
1. Load incident response checklist
2. Log into Ops War Room
Respond
 First responder
1. Load incident response checklist
2. Log into Ops War Room
3. Log incident in JIRA
Respond
 First responder
1. Load incident response checklist
2. Log into Ops War Room
3. Log incident in JIRA
4. Begin investigation
 Key response principles
 Log everything
Respond
Respond
 Key response principles
 Log everything
 Frequent public updates
Respond
 Key response principles
 Log everything
 Frequent public updates
 Gather the team
Respond
 Key response principles
 Log everything
 Frequent public updates
 Gather the team
 Escalate!
 Within a few days
Postmortem
 Within a few days
 Tell the story
Postmortem
 Within a few days
 Tell the story
 Appropriate technical detail
Postmortem
 Within a few days
 Tell the story
 Appropriate technical detail
 What failed, why?
Postmortem
Postmortem
 How its going to be 鍖xed
Postmortem
障
david@serverdensity.com
@davidmytton

More Related Content

Handling incidents