�ݺ�ߣ

Baking-In Transparency

Saturday, October 8, 11

About Me
• Matt Simmons
•
•
•
•

11+ year System Administrator
http://www.standalone-sysadmin.com
@standaloneSA
standalone.sysadmin@gmail.com


The Situation


Devs make things
• Small discrete programs
• Large complex programs
• Immense interconnected software suites


Ops makes things go
• Script using small discrete programs
• Administer large complex programs
• Cluster immense interconnected software suites


There is a

direct relationship
between the software that
developers write and the
software that gets
implemented by operations.

The Problems


Software needs to be monitored
"When performance is measured, performance
improves. When performance is measured and
reported back, the rate of improvement accelerates."
--Pearson’s Law


Why?
“You can’t manage what you can’t measure”
--Robert Kaplan


Software needs to be
managedClearly we need to
“Management by objective works - if you
know the objective. 90% of the time, you don’t.”
--Peter Drucker


Clearly we need to measure...
But what do we measure?
And what metrics do we use?
How do we obtain the measurements?


What do we measure?

Software Engineers measure...
• Programmer Productivity
• code size/efﬁciency
• Defect Density
• Bugs / module size
• Requirement Stability
• “feature creep”

What do we measure?

Operations measures...
•
•
•


Resource Utilization

•

Diskspace, Bandwidth, etc

Infrastructure Stability

•

Service Uptime, MTBF, etc

Performance

•

CPU / Memory efﬁciency, etc

What metrics do we use?

It depends.
Duh.


The metrics that Ops needs to
monitor are not always easy to obtain...


...even though they’re
really important

• Reliability
• Repeatability
• Root Cause Identiﬁcation


...so not only is monitoring important...


Monitoring is hard.


correctly
V

Monitoring is hard.


Why is monitoring hard?
• Monitoring Software Suites are complex
• Infrastructures are complex
• Processes and applications are opaque to

our futile requests to determine and track
internal state


Processes and applications
are opaque to our futile
requests to determine and
track internal state


The Solution(s)


Dev/Ops working together gives

• Team Interrelationships
• Knowledge Sharing
• Cross Training
• Tool Sharing

But more speciﬁcally...
Methods of monitoring software can be
BUILT INTO THE SOFTWARE


How things are designed now
Question: A well-designed program encounters
an error. What happens?
Answer: It handles the error, and continues
processing requests


How things are designed now
Question: A poorly-designed program
encounters an error. What happens?
Answer: It crashes and burns


Question:
Which of those is easier to monitor?


Obviously, dying to alert the
monitoring system is overkill.
(pun ﬁrmly intended)


How do we make our statuses available
to the monitoring system, then?

It depends on the kind of software


Remember these?

• Small discrete programs
• Large complex programs
• Immense interconnected software suites


Small Discrete Programs

• Possibly a utility
• Usually scripted or run manually
• Typically short-term run time


Small Discrete Programs:
Monitoring

• Screen output
• Return codes
• Catch signals
• Great example: ping & SIGQUIT
• SIGUSR1 & SIGUSR2

Signal Handling in Perl

sub USR1_handler {
drop_state_file();
}
$SIG{‘USR1’} = ‘USR1_handler’;


Large Complex Programs

• Probably a daemon or interactive program
• Long running, needs to be stable
• Subject to resource change over time
• May need to retain state across restarts
• May have a web component

Large Complex Programs:
Reporting

• No screen output (except debugging)
• Logging
• SNMP Agent/Traps
• (seriously, read ‘man snmpd.conf’)
• Named Pipes (FIFO)
• State Output to DB (if appropriate)

Net-SNMP Embedded Perl
perl use Data::Dumper;
perl sub myroutine {
print "got called:",Dumper(@_),"n";
}
perl $agent->register
('mylink', '.1.3.6.1.8765', &myroutine);


Immense Interconnected
Software Suites
(or Large


Suites)

Large Suites

• Deﬁnitely retain state across restarts
• Probably requires centralized controller
• May use sockets to communicate
• Probably has a web component

Large Suites:
Reporting
Everything under “Large Programs”, plus...

• Monitoring coordinated by the “central”
node or program

• Aggregation of state
• Provide layer of abstraction from any insuite monitoring or reporting

• Provide XML/CSV in addition to humanparsable HTML pages


What we’re really doing is IPC
So what other methods exist? Lots.


Unix IPC
• Sockets
• RPC
• Message Queues
• FIFO
• Shared Memory
• And Many More...

They shouldn’t all be used...


What is important is that you use SOMETHING


What is best?
To crush your enemies, see them
driven before you, and to hear the
lamentation of their women?


What is best?
• An application that is easily and openly
monitored

• A developer that considers monitoring in
all phases of design and development

• A developer who writes their own
monitoring checks


Do us all a favor...
When you develop software, be it scripts, utilities,
programs, or suites, please please please...


Do us all a favor...
When you develop software, be it scripts, utilities,
programs, or suites, please please please...

Consider how we Ops folks
will manage and monitor it.

Baking-In Transparency
Thank you for your time.

Matt Simmons
standaloneSA on Twitter
standalone.sysadmin@gmail.com
http://www.standalone-sysadmin.com

�ݺ�ߣ

Baking-In Transparency

More Related Content

Baking-In Transparency