This document provides tips for communicating campus status updates during a system outage or incident. It recommends establishing a communication timeline and schedule, keeping messages brief but informative, and specifying what is impacted, when, and using simple language. The tips suggest automatically subscribing key groups, engaging customers on social media, and searching mentions of impacted services to respond on social media. It also outlines roles, timelines, and information flow for communication during an outage, including establishing an Incident Commander and Liaison and coordinating response and follow-up.
1 of 28
Download to read offline
More Related Content
Status updates
1. Keeping the campus community informed
Status Updates
Shawn Plummer
Laurie Fox
2. State University of New York
Geneseo
Located in the historic village of Geneseo in
the upstate Finger Lakes region, the State
University of New York at Geneseo is a
premier public liberal arts college with a rich
tradition of academic excellence. We are
dedicated to developing socially responsible
citizens with skills and values for a productive
life.
15. Communication Timeline
? Establish when you will
communicate again
? Stick to the schedule, even if
you have nothing to report
? If you don¡¯t your customers will
wonder.
16. Rules for posting a status
message
? Be specific about what is impacted
? Be specific about when things are impacted
? Put it in simple language
? Put the most important information first
? Respect your users and front line staff
18. Tips to better communicate
? Automatically subscribe select users to the status system,
including all technical staff and student employees of the
department.
? Encourage all department chairs and secretaries to subscribe to
the status system to receive updates during outages.
? Search for mentions of the services that are impacted during the
emergency to respond to random mentions on social media.
? Engage customers via Twitter and email when there are problems
and link to status posts. This increases the awareness of your
status system and Twitter feed.
20. This is not a drill
What if your electronic communication
systems are not working?
24. Roles
? People Fixing the problem
? Incident Commander
? Incident Communication Liaison
http://www.fema.gov/national-incident-management-system
https://blog.heroku.com/archives/2014/5/9/incident-response-at-heroku
http://en.wikipedia.org/wiki/Incident_Command_System
25. Timeline
? Move to a shared chat room
? Establish the IC/ICL
? Post an Initial Status about the issue
? Determine Scope, Impact, & Duration if Possible
? Coordinate the Response
? Mitigate the Problem
? Manage On-Going Responses
? Post-incident Cleanup
? Post-incident Follow-up
26. Information Flow
? Get the burden of communication off the people fixing it
? Ticket in ticket system with all parties subscribed
? Importance of internal communication channel
(HipChat/Slack)
? Ideally your communication medium can also serve as
documentation medium
Talk about how we still have all those detailed service checks but they are of limited use to most customers.
Custom Field Template
Get Custom Field Values
HipChat
iFrame
Sunscribe2
WP TO Twitter
A communication timeline is extremely important so that your users know the problem is still being worked on, No one needs to wonder
Engage customers via Twitter and email when there are problems and link to status posts. This increases the awareness of your status system and Twitter feed.
Encourage all department chairs and secretaries to subscribe to the status system to receive updates during outages.
Search for mentions of the services that are impacted during the emergency to respond to random mentions on social media.
Automatically subscribe select users to the status system, including all technical staff and student employees of the department.
Have a plan for if your electronic communication systems are not working. These could include:
Phone trees, external email addresses, sneaker net.
This probably does not apply to a short outage that requires minimal fixing. But can be useful for all outages.
IC/ICL could be same person or separate people.
Their job is to
Track what is being done, stay on top of it. Note things that may need to be undone or revisited in the post mortem. Handle getting more resources for a problem.
Communicate in simple plain language the scope of the outage and when communication will occur next. It can also help to communicate some specific steps that are being taken so customers see progress is being made.
Monitor next communication time and be ready to post an update
Answer questions about the outage from the community
Get feedback from the community about new developments and share it with the people working on the outage.
Brief new comers.
By default the IC is the ICL and is also the person that first starts working on the outage. For small or short lived outages this may not change. For outages that are not a quick fix, designate an IC. if you want someone to handle communication on behalf of the IC then the IC can have an ICL. The key is that the IC/ICL is not the person fixing the problem for long outages.
Coordinate response. In coordinating the response, the IC focuses on bringing in the right people to solve the problem and making sure that they have the information they need. The IC can use a HipChat bot to page in additional teams as needed (the page will route to the on-call person for that team), or page individuals directly.
The IC may also create a shared Google Doc for the team to collect notes together in real time, or start a high-bandwidth video call for more quickly working through issues than is possible with text chat.
Mitigate problem. Once the response team has some sense of the problem, it will try to mitigate customer-facing effects if possible. For example, we may put the Platform API in maintenance mode to reduce load on infrastructure systems, or boot additional instances in our fleet to temporarily compensate for capacity issues. A successful mitigation will reduce the impact of the incident on customer apps and actions, or at least prevent the customer-facing issues from getting worse.
The method for the team to communicate can be as detailed and