This is from an invited talk I gave at the Pittsburgh Perl Workshop a few years back. It's not often that I get a chance to talk to developers, so I thought I'd take advantage of it and yell at them a bit ;-)
2. About Me
Matt Simmons
11+ year System Administrator
http://www.standalone-sysadmin.com
@standaloneSA
standalone.sysadmin@gmail.com
Saturday, October 8, 11
5. Devs make things
Small discrete programs
Large complex programs
Immense interconnected software suites
Saturday, October 8, 11
6. Ops makes things go
Script using small discrete programs
Administer large complex programs
Cluster immense interconnected software suites
Saturday, October 8, 11
7. There is a
direct relationship
between the software that
developers write and the
software that gets
implemented by operations.
Saturday, October 8, 11
9. Software needs to be monitored
"When performance is measured, performance
improves. When performance is measured and
reported back, the rate of improvement accelerates."
--Pearsons Law
Saturday, October 8, 11
10. Why?
You cant manage what you cant measure
--Robert Kaplan
Saturday, October 8, 11
11. Software needs to be
managedClearly we need to
Management by objective works - if you
know the objective. 90% of the time, you dont.
--Peter Drucker
Saturday, October 8, 11
12. Clearly we need to measure...
But what do we measure?
And what metrics do we use?
How do we obtain the measurements?
Saturday, October 8, 11
13. What do we measure?
Software Engineers measure...
Programmer Productivity
code size/ef鍖ciency
Defect Density
Bugs / module size
Requirement Stability
feature creep
Saturday, October 8, 11
14. What do we measure?
Operations measures...
Saturday, October 8, 11
Resource Utilization
Diskspace, Bandwidth, etc
Infrastructure Stability
Service Uptime, MTBF, etc
Performance
CPU / Memory ef鍖ciency, etc
15. What metrics do we use?
It depends.
Duh.
Saturday, October 8, 11
16. The metrics that Ops needs to
monitor are not always easy to obtain...
Saturday, October 8, 11
17. ...even though theyre
really important
Reliability
Repeatability
Root Cause Identi鍖cation
Saturday, October 8, 11
18. ...so not only is monitoring important...
Saturday, October 8, 11
21. Why is monitoring hard?
Monitoring Software Suites are complex
Infrastructures are complex
Processes and applications are opaque to
our futile requests to determine and track
internal state
Saturday, October 8, 11
22. Processes and applications
are opaque to our futile
requests to determine and
track internal state
Saturday, October 8, 11
26. How things are designed now
Question: A well-designed program encounters
an error. What happens?
Answer: It handles the error, and continues
processing requests
Saturday, October 8, 11
27. How things are designed now
Question: A poorly-designed program
encounters an error. What happens?
Answer: It crashes and burns
Saturday, October 8, 11
29. Obviously, dying to alert the
monitoring system is overkill.
(pun 鍖rmly intended)
Saturday, October 8, 11
30. How do we make our statuses available
to the monitoring system, then?
It depends on the kind of software
Saturday, October 8, 11
31. Remember these?
Small discrete programs
Large complex programs
Immense interconnected software suites
Saturday, October 8, 11
32. Small Discrete Programs
Possibly a utility
Usually scripted or run manually
Typically short-term run time
Saturday, October 8, 11
33. Small Discrete Programs:
Monitoring
Screen output
Return codes
Catch signals
Great example: ping & SIGQUIT
SIGUSR1 & SIGUSR2
Saturday, October 8, 11
34. Signal Handling in Perl
sub USR1_handler {
drop_state_file();
}
$SIG{USR1} = USR1_handler;
Saturday, October 8, 11
35. Large Complex Programs
Probably a daemon or interactive program
Long running, needs to be stable
Subject to resource change over time
May need to retain state across restarts
May have a web component
Saturday, October 8, 11
36. Large Complex Programs:
Reporting
No screen output (except debugging)
Logging
SNMP Agent/Traps
(seriously, read man snmpd.conf)
Named Pipes (FIFO)
State Output to DB (if appropriate)
Saturday, October 8, 11
37. Net-SNMP Embedded Perl
perl use Data::Dumper;
perl sub myroutine {
print "got called:",Dumper(@_),"n";
}
perl $agent->register
('mylink', '.1.3.6.1.8765', &myroutine);
Saturday, October 8, 11
39. Large Suites
De鍖nitely retain state across restarts
Probably requires centralized controller
May use sockets to communicate
Probably has a web component
Saturday, October 8, 11
40. Large Suites:
Reporting
Everything under Large Programs, plus...
Monitoring coordinated by the central
node or program
Aggregation of state
Provide layer of abstraction from any insuite monitoring or reporting
Provide XML/CSV in addition to humanparsable HTML pages
Saturday, October 8, 11
41. What were really doing is IPC
So what other methods exist? Lots.
Saturday, October 8, 11
42. Unix IPC
Sockets
RPC
Message Queues
FIFO
Shared Memory
And Many More...
Saturday, October 8, 11
45. What is best?
To crush your enemies, see them
driven before you, and to hear the
lamentation of their women?
Saturday, October 8, 11
46. What is best?
An application that is easily and openly
monitored
A developer that considers monitoring in
all phases of design and development
A developer who writes their own
monitoring checks
Saturday, October 8, 11
47. Do us all a favor...
When you develop software, be it scripts, utilities,
programs, or suites, please please please...
Saturday, October 8, 11
48. Do us all a favor...
When you develop software, be it scripts, utilities,
programs, or suites, please please please...
Consider how we Ops folks
will manage and monitor it.
Saturday, October 8, 11
49. Baking-In Transparency
Thank you for your time.
Matt Simmons
standaloneSA on Twitter
standalone.sysadmin@gmail.com
http://www.standalone-sysadmin.com
Saturday, October 8, 11