際際滷

際際滷Share a Scribd company logo
Baking-In Transparency

Saturday, October 8, 11
About Me
 Matt Simmons





11+ year System Administrator
http://www.standalone-sysadmin.com
@standaloneSA
standalone.sysadmin@gmail.com

Saturday, October 8, 11
Baking-In Transparency

Saturday, October 8, 11
The Situation

Saturday, October 8, 11
Devs make things
 Small discrete programs
 Large complex programs
 Immense interconnected software suites

Saturday, October 8, 11
Ops makes things go
 Script using small discrete programs
 Administer large complex programs
 Cluster immense interconnected software suites

Saturday, October 8, 11
There is a

direct relationship
between the software that
developers write and the
software that gets
implemented by operations.
Saturday, October 8, 11
The Problems

Saturday, October 8, 11
Software needs to be monitored
"When performance is measured, performance
improves. When performance is measured and
reported back, the rate of improvement accelerates."
--Pearsons Law

Saturday, October 8, 11
Why?
You cant manage what you cant measure
--Robert Kaplan

Saturday, October 8, 11
Software needs to be
managedClearly we need to
Management by objective works - if you
know the objective. 90% of the time, you dont.
--Peter Drucker

Saturday, October 8, 11
Clearly we need to measure...
But what do we measure?
And what metrics do we use?
How do we obtain the measurements?

Saturday, October 8, 11
What do we measure?

Software Engineers measure...
 Programmer Productivity
 code size/ef鍖ciency
 Defect Density
 Bugs / module size
 Requirement Stability
 feature creep
Saturday, October 8, 11
What do we measure?

Operations measures...




Saturday, October 8, 11

Resource Utilization



Diskspace, Bandwidth, etc

Infrastructure Stability



Service Uptime, MTBF, etc

Performance



CPU / Memory ef鍖ciency, etc
What metrics do we use?

It depends.
Duh.

Saturday, October 8, 11
The metrics that Ops needs to
monitor are not always easy to obtain...

Saturday, October 8, 11
...even though theyre
really important

 Reliability
 Repeatability
 Root Cause Identi鍖cation

Saturday, October 8, 11
...so not only is monitoring important...

Saturday, October 8, 11
Monitoring is hard.

Saturday, October 8, 11
correctly
V

Monitoring is hard.

Saturday, October 8, 11
Why is monitoring hard?
 Monitoring Software Suites are complex
 Infrastructures are complex
 Processes and applications are opaque to

our futile requests to determine and track
internal state

Saturday, October 8, 11
Processes and applications
are opaque to our futile
requests to determine and
track internal state

Saturday, October 8, 11
The Solution(s)

Saturday, October 8, 11
Dev/Ops working together gives

 Team Interrelationships
 Knowledge Sharing
 Cross Training
 Tool Sharing
Saturday, October 8, 11
But more speci鍖cally...
Methods of monitoring software can be
BUILT INTO THE SOFTWARE

Saturday, October 8, 11
How things are designed now
Question: A well-designed program encounters
an error. What happens?
Answer: It handles the error, and continues
processing requests

Saturday, October 8, 11
How things are designed now
Question: A poorly-designed program
encounters an error. What happens?
Answer: It crashes and burns

Saturday, October 8, 11
Question:
Which of those is easier to monitor?

Saturday, October 8, 11
Obviously, dying to alert the
monitoring system is overkill.
(pun 鍖rmly intended)

Saturday, October 8, 11
How do we make our statuses available
to the monitoring system, then?

It depends on the kind of software

Saturday, October 8, 11
Remember these?

 Small discrete programs
 Large complex programs
 Immense interconnected software suites

Saturday, October 8, 11
Small Discrete Programs

 Possibly a utility
 Usually scripted or run manually
 Typically short-term run time

Saturday, October 8, 11
Small Discrete Programs:
Monitoring

 Screen output
 Return codes
 Catch signals
 Great example: ping & SIGQUIT
 SIGUSR1 & SIGUSR2
Saturday, October 8, 11
Signal Handling in Perl

sub USR1_handler {
drop_state_file();
}
$SIG{USR1} = USR1_handler;

Saturday, October 8, 11
Large Complex Programs

 Probably a daemon or interactive program
 Long running, needs to be stable
 Subject to resource change over time
 May need to retain state across restarts
 May have a web component
Saturday, October 8, 11
Large Complex Programs:
Reporting

 No screen output (except debugging)
 Logging
 SNMP Agent/Traps
 (seriously, read man snmpd.conf)
 Named Pipes (FIFO)
 State Output to DB (if appropriate)
Saturday, October 8, 11
Net-SNMP Embedded Perl
perl use Data::Dumper;
perl sub myroutine {
print "got called:",Dumper(@_),"n";
}
perl $agent->register
('mylink', '.1.3.6.1.8765', &myroutine);

Saturday, October 8, 11
Immense Interconnected
Software Suites
(or Large

Saturday, October 8, 11

Suites)
Large Suites

 De鍖nitely retain state across restarts
 Probably requires centralized controller
 May use sockets to communicate
 Probably has a web component
Saturday, October 8, 11
Large Suites:
Reporting
Everything under Large Programs, plus...

 Monitoring coordinated by the central
node or program

 Aggregation of state
 Provide layer of abstraction from any insuite monitoring or reporting

 Provide XML/CSV in addition to humanparsable HTML pages

Saturday, October 8, 11
What were really doing is IPC
So what other methods exist? Lots.

Saturday, October 8, 11
Unix IPC
 Sockets
 RPC
 Message Queues
 FIFO
 Shared Memory
 And Many More...
Saturday, October 8, 11
They shouldnt all be used...

Saturday, October 8, 11
What is important is that you use SOMETHING

Saturday, October 8, 11
What is best?
To crush your enemies, see them
driven before you, and to hear the
lamentation of their women?

Saturday, October 8, 11
What is best?
 An application that is easily and openly
monitored

 A developer that considers monitoring in
all phases of design and development

 A developer who writes their own
monitoring checks

Saturday, October 8, 11
Do us all a favor...
When you develop software, be it scripts, utilities,
programs, or suites, please please please...

Saturday, October 8, 11
Do us all a favor...
When you develop software, be it scripts, utilities,
programs, or suites, please please please...

Consider how we Ops folks
will manage and monitor it.
Saturday, October 8, 11
Baking-In Transparency
Thank you for your time.

Matt Simmons
standaloneSA on Twitter
standalone.sysadmin@gmail.com
http://www.standalone-sysadmin.com
Saturday, October 8, 11

More Related Content

Baking-In Transparency

  • 2. About Me Matt Simmons 11+ year System Administrator http://www.standalone-sysadmin.com @standaloneSA standalone.sysadmin@gmail.com Saturday, October 8, 11
  • 5. Devs make things Small discrete programs Large complex programs Immense interconnected software suites Saturday, October 8, 11
  • 6. Ops makes things go Script using small discrete programs Administer large complex programs Cluster immense interconnected software suites Saturday, October 8, 11
  • 7. There is a direct relationship between the software that developers write and the software that gets implemented by operations. Saturday, October 8, 11
  • 9. Software needs to be monitored "When performance is measured, performance improves. When performance is measured and reported back, the rate of improvement accelerates." --Pearsons Law Saturday, October 8, 11
  • 10. Why? You cant manage what you cant measure --Robert Kaplan Saturday, October 8, 11
  • 11. Software needs to be managedClearly we need to Management by objective works - if you know the objective. 90% of the time, you dont. --Peter Drucker Saturday, October 8, 11
  • 12. Clearly we need to measure... But what do we measure? And what metrics do we use? How do we obtain the measurements? Saturday, October 8, 11
  • 13. What do we measure? Software Engineers measure... Programmer Productivity code size/ef鍖ciency Defect Density Bugs / module size Requirement Stability feature creep Saturday, October 8, 11
  • 14. What do we measure? Operations measures... Saturday, October 8, 11 Resource Utilization Diskspace, Bandwidth, etc Infrastructure Stability Service Uptime, MTBF, etc Performance CPU / Memory ef鍖ciency, etc
  • 15. What metrics do we use? It depends. Duh. Saturday, October 8, 11
  • 16. The metrics that Ops needs to monitor are not always easy to obtain... Saturday, October 8, 11
  • 17. ...even though theyre really important Reliability Repeatability Root Cause Identi鍖cation Saturday, October 8, 11
  • 18. ...so not only is monitoring important... Saturday, October 8, 11
  • 21. Why is monitoring hard? Monitoring Software Suites are complex Infrastructures are complex Processes and applications are opaque to our futile requests to determine and track internal state Saturday, October 8, 11
  • 22. Processes and applications are opaque to our futile requests to determine and track internal state Saturday, October 8, 11
  • 24. Dev/Ops working together gives Team Interrelationships Knowledge Sharing Cross Training Tool Sharing Saturday, October 8, 11
  • 25. But more speci鍖cally... Methods of monitoring software can be BUILT INTO THE SOFTWARE Saturday, October 8, 11
  • 26. How things are designed now Question: A well-designed program encounters an error. What happens? Answer: It handles the error, and continues processing requests Saturday, October 8, 11
  • 27. How things are designed now Question: A poorly-designed program encounters an error. What happens? Answer: It crashes and burns Saturday, October 8, 11
  • 28. Question: Which of those is easier to monitor? Saturday, October 8, 11
  • 29. Obviously, dying to alert the monitoring system is overkill. (pun 鍖rmly intended) Saturday, October 8, 11
  • 30. How do we make our statuses available to the monitoring system, then? It depends on the kind of software Saturday, October 8, 11
  • 31. Remember these? Small discrete programs Large complex programs Immense interconnected software suites Saturday, October 8, 11
  • 32. Small Discrete Programs Possibly a utility Usually scripted or run manually Typically short-term run time Saturday, October 8, 11
  • 33. Small Discrete Programs: Monitoring Screen output Return codes Catch signals Great example: ping & SIGQUIT SIGUSR1 & SIGUSR2 Saturday, October 8, 11
  • 34. Signal Handling in Perl sub USR1_handler { drop_state_file(); } $SIG{USR1} = USR1_handler; Saturday, October 8, 11
  • 35. Large Complex Programs Probably a daemon or interactive program Long running, needs to be stable Subject to resource change over time May need to retain state across restarts May have a web component Saturday, October 8, 11
  • 36. Large Complex Programs: Reporting No screen output (except debugging) Logging SNMP Agent/Traps (seriously, read man snmpd.conf) Named Pipes (FIFO) State Output to DB (if appropriate) Saturday, October 8, 11
  • 37. Net-SNMP Embedded Perl perl use Data::Dumper; perl sub myroutine { print "got called:",Dumper(@_),"n"; } perl $agent->register ('mylink', '.1.3.6.1.8765', &myroutine); Saturday, October 8, 11
  • 38. Immense Interconnected Software Suites (or Large Saturday, October 8, 11 Suites)
  • 39. Large Suites De鍖nitely retain state across restarts Probably requires centralized controller May use sockets to communicate Probably has a web component Saturday, October 8, 11
  • 40. Large Suites: Reporting Everything under Large Programs, plus... Monitoring coordinated by the central node or program Aggregation of state Provide layer of abstraction from any insuite monitoring or reporting Provide XML/CSV in addition to humanparsable HTML pages Saturday, October 8, 11
  • 41. What were really doing is IPC So what other methods exist? Lots. Saturday, October 8, 11
  • 42. Unix IPC Sockets RPC Message Queues FIFO Shared Memory And Many More... Saturday, October 8, 11
  • 43. They shouldnt all be used... Saturday, October 8, 11
  • 44. What is important is that you use SOMETHING Saturday, October 8, 11
  • 45. What is best? To crush your enemies, see them driven before you, and to hear the lamentation of their women? Saturday, October 8, 11
  • 46. What is best? An application that is easily and openly monitored A developer that considers monitoring in all phases of design and development A developer who writes their own monitoring checks Saturday, October 8, 11
  • 47. Do us all a favor... When you develop software, be it scripts, utilities, programs, or suites, please please please... Saturday, October 8, 11
  • 48. Do us all a favor... When you develop software, be it scripts, utilities, programs, or suites, please please please... Consider how we Ops folks will manage and monitor it. Saturday, October 8, 11
  • 49. Baking-In Transparency Thank you for your time. Matt Simmons standaloneSA on Twitter standalone.sysadmin@gmail.com http://www.standalone-sysadmin.com Saturday, October 8, 11