ݺߣ

ݺߣShare a Scribd company logo
A SEARCH FOR THE UNKNOWNS
*?
*?
Michele Chubirka, aka Mrs. Y, is a senior network security
engineer and blogger. Hosts Healthy Paranoia, a security
podcast. Researches and speaks on topics such as affective
neuroscience and the psychology of decision making.
Disclaimer:	
 ?No	
 ?swans	
 ?were	
 ?harmed	
 ?in	
 ?the	
 ?making	
 ?of	
 ?this	
 ?presenta7on	
 ?
"The entire security industry is wired so that
the oldest and least effective methods will
profit most.
Josh Corman, Director of Security Intelligence at
Akamai, the content delivery network.
In Verizons 2012 Data
Breach Investigations
Report, it was found that
across organizations, an
external party discovers
92% of breaches.
Somethings Broken
Imperva conducted a study and released a report in 2012 on the
effectiveness of antivirus software.
?? Out of approximately 80 pieces of malware, the initial detection rate
for new malware was less than 5%.
?? For some AV vendors, it may take up to four weeks to detect a new
virus from the time of the initial scan.
?? Software cost wasnt a factor. Some free programs performed better.
?? In 2011, Gartner reported that consumers spent $4.5 billion on
antivirus and enterprises spent $2.9 billion. The total of $7.4 billion is
more than a third of the total of $17.7 billion spent on security
software.
Anti-Virus Ineffective
??We believe we can solve the issue of the unknowns,
intrusions, with more data.
??The more information we have, the less we know.
??This makes us no better than security archeologists.
From Compromise To Discovery
??An unknown unknown.
??Cant be predicted by
probability theories.
??Rationalized after the fact.
??How often do we try to
predict the Black Swan
Event in security and fail?
The Black Swan Event
Military drone operators amass untold amounts of data
that never is fully analyzed because it is simply too much.
Michael W. Isherwood, defense analyst and former Air
Force fighter pilot.
Information Gluttony?
?? From beginning of recorded time to 2003 - five exabytes
of information.
?? 2011 - that much created every two days.
?? 2012 - prediction is every 10 minutes.
Digital Kudzu
??SIEMs: never gets fully implemented.
??Predictions using Logistic Regression/Bayesian
Probability.
??Huge amounts of data, not enough time.
??Open world problem using closed world assumptions.
??More staff, more money.
Current Solutions
the ability of our unconscious to find patterns in
situations and behavior based on very narrow slices of
experience.
Malcolm Gladwell, Blink
Alternative Model: Thin Slicing
??Cook County Hospital struggled with identifying patients
in danger of an imminent heart attack.
??Coronary care unit was overwhelmed.
??Public hospital, limited resources.
??ICU is dangerous.
Case Study: A Hospital in Trouble
??Lee Goldman, a cardiologist, created a protocol based
upon an algorithm developed in partnership with
mathematicians.
??After two years of using a decision tree, hospital staff
were 70% more effective at recognizing patients at risk.
??Less information led to greater success.
??Technique used by first-responders every day.
Applied Thin-Slicing
Violations of logical reasoning [are] interpreted as
cognitive fallacies, yet what appears to be a fallacy can
often also be seen as adaptive behavior, if one is willing to
rethink the norm.
Gerd Gigerenzer, Rationality for Mortals
Bounded Rationality
(1997) found that, compared with a logistic regression model that
uses eight cues simultaneously to make a decision, this FFT had a
higher overall predictive accuracy, in addition to its advantages in
Tree models of categorization and decision making have been
studied in a variety of disciplines, such as medicine, applied
statistics, computer science, and psychology (e.g., Breiman, Fried-
Did prosecution request
conditional bail or oppose bail?
No or N.A.
Nonpunitive
Yes
Punitive
Did previous court impose
conditions or remand in custody?
Yes
Punitive
Did police impose conditions or
remand in custody?
Yes
Punitive
No or N.A.
No or N.A.
ST segment
change?
No Yes
Coronary
Care UnitChief complaint of
chest pain?
Regular
Nursing Bed
No
Any other factor?
(NTG, MI, ST?, ST, T)
Regular
Nursing Bed
No Yes
Coronary
Care Unit
Yes
a b
Figure 4. Two examples of fast-and-frugal trees (FFTs) applied to large world problems. The left tree (a) is
designed to help emergency room doctors decide whether to send a patient with severe chest pain to the Coronary
Care Unit (CCU) or a regular nursing bed (Green & Mehr, 1997). The right tree (b) is a model of how British
judges decide whether to make a punitive bail decision (Dhami, 2003).
320 LUAN, SCHOOLER, AND GIGERENZER
Fast and Frugal Trees
??Semantic Web technology.
??Queries based on relationships or mental associations.
??Graphs treat each packet from capture file as a discrete
event with properties.
??TCP header info in a metadata model.
??Model replicates human cognitive economy.
Method: Resource Description
Framework (RDF)
??SPARQL query language uses a concise approach for
quickly traversing large data sets while capturing
similarities between packets as generalizations.
??RDF statement contains a subject, predicate and an
object.
??Subject defines the event.
??Predicate defines a characteristic or property.
??Object contains the value for the predicate.
Thin-Slicing with SPARQL
sparql select * {
?s
?p
?o.};
sparql select *{
?e1
<http://www.rrecktek.com/demo/src>
?ip1.};
Example: Building A Query
?? All source IPs and their destination IPs.
?? For each source, count how many times it went to a
destination.
?? Report source destination and count.
sparql SELECT ?src ?dst (count (?dst) as ?count) {
?e1 <http://www.rrecktek.com/demo/src> ?src.
?e1 <http://www.rrecktek.com/demo/dst> ?dst.
} ORDER BY DESC (?count);
Example
Which machines were the destination of the most traffic?
sparql select * {
?event <http://www.rrecktek.com/demo/dst> ?dst.
} limit 10;
sparql select distinct (?dst) (COUNT (?src) as ?count) {
?event <http://www.rrecktek.com/demo/dst> ?dst.
?event <http://www.rrecktek.com/demo/src> ?src.
} ORDER BY DESC(?count) limit 10;
	
 ?
Example 2
What times did the machines talk to each other ?
sparql select * {
?e <http://www.rrecktek.com/demo/src> "135.8.60.182".
?e <http://www.rrecktek.com/demo/dst> "172.16.113.50".
?e <http://www.rrecktek.com/demo/date> ?date.
FILTER regex(?date, "1998-06-04").
?e <http://www.rrecktek.com/demo/time> ?time }; 	
 ?
	
 ?
Example 3
SPARQL web
interface
??What we can do
??Build strong infrastructures and secure applications minimizing
technical debt.
??Create data classification schemes based upon the business
and technical service catalogs to better create better
segmentation.
??Add the equivalent of air bags to the architecture for when
intrusions occur.
??Recognize signature limitations.
??Investigate the creation of real-time fast and frugal trees.
Our patient is dying on the table. Its up to us to change the
outcome.
We Cant Fight All Unknowns
??Michele Chubirka
www.healthyparanoia.com
Twitter @MrsYisWhy
networksecurityprincess@gmail.com
??RDF/SPARQL contribution courtesy of Ronald P. Reck
rreck@rrecktek.com
Thanks!
"Eclectic Tech." Semantic Web Introduction. N.p., n.d. Web. 20 Dec. 2012.
Erwin, Sandra I. "Too Much Information, Not Enough Intelligence." National Defense Magazine. N.p., May
2012. Web. <http://www.nationaldefense.org>.
Gigerenzer, Gerd. Gut Feelings: The Intelligence of the Unconscious. New York: Viking, 2007. Print.
Gigerenzer, Gerd. Rationality for Mortals: How People Cope with Uncertainty. Oxford: Oxford UP, 2008. Print.
Gladwell, Malcolm. Blink: The Power of Thinking without Thinking. New York: Little, Brown and, 2005. Print.
Hacker Intelligence Initiative, Monthly Trend Report #14. Rep. Imperva, Dec. 2012. Web. Dec. 2012.
Luan, Shenghua, Lael J. Schooler, and Gerd Gigerenzer. "A Signal-detection Analysis of Fast-and-frugal
Trees." Psychological Review 118.2 (2011): 316-38. Print.
Marewski, Julian N., PhD, and Gerd Gigerenzer, PhD. "Heuristic Decision Making in Medicine." Dialogues in
Clinical Neuroscience 14.1 (2012): 77-89. Print.
Messmer, Ellen. "SANS Warns IT Groups Fail to Focus on Logs for Security Clues." TechWorld. IDG, May
2012. Web.
"RDF." -Semantic Web Standards. W3C, n.d. Web. 02 Jan. 2013.
"Resource Description Framework (RDF)Model and Syntax." RDF Model and Syntax. W3C, n.d. Web. 02 Jan.
2013.
Rieland, Randy. "Big Data or Too Much Information?" Innovations. Smithsonian, 7 May 2012. Web.
Sandoval, Greg. "Foreign Hackers Steal More Than a Terabyte of Data per Day in Ongoing Cyberwar." The
Verge. N.p., 27 Feb. 2013. Web. 27 Feb. 2013.
"Semantic Web Standards." W3C. W3C, n.d. Web. 02 Jan. 2013.
Taleb, Nassim. The Black Swan: The Impact of the Highly Improbable. New York: Random House, 2007. Print.
Turek, Dave. "The Case Against Digital Sprawl." The Management Blog. Bloomberg Businessweek, 2 May
2012. Web.
Verizon 2012 Data Breach Investigation Report. Rep. N.p.: Verizon, n.d. Print.
References

More Related Content

Thin Slicing a Black Swan: A Search for the Unknowns

  • 1. A SEARCH FOR THE UNKNOWNS *?
  • 2. *? Michele Chubirka, aka Mrs. Y, is a senior network security engineer and blogger. Hosts Healthy Paranoia, a security podcast. Researches and speaks on topics such as affective neuroscience and the psychology of decision making. Disclaimer: ?No ?swans ?were ?harmed ?in ?the ?making ?of ?this ?presenta7on ?
  • 3. "The entire security industry is wired so that the oldest and least effective methods will profit most. Josh Corman, Director of Security Intelligence at Akamai, the content delivery network.
  • 4. In Verizons 2012 Data Breach Investigations Report, it was found that across organizations, an external party discovers 92% of breaches. Somethings Broken
  • 5. Imperva conducted a study and released a report in 2012 on the effectiveness of antivirus software. ?? Out of approximately 80 pieces of malware, the initial detection rate for new malware was less than 5%. ?? For some AV vendors, it may take up to four weeks to detect a new virus from the time of the initial scan. ?? Software cost wasnt a factor. Some free programs performed better. ?? In 2011, Gartner reported that consumers spent $4.5 billion on antivirus and enterprises spent $2.9 billion. The total of $7.4 billion is more than a third of the total of $17.7 billion spent on security software. Anti-Virus Ineffective
  • 6. ??We believe we can solve the issue of the unknowns, intrusions, with more data. ??The more information we have, the less we know. ??This makes us no better than security archeologists. From Compromise To Discovery
  • 7. ??An unknown unknown. ??Cant be predicted by probability theories. ??Rationalized after the fact. ??How often do we try to predict the Black Swan Event in security and fail? The Black Swan Event
  • 8. Military drone operators amass untold amounts of data that never is fully analyzed because it is simply too much. Michael W. Isherwood, defense analyst and former Air Force fighter pilot. Information Gluttony?
  • 9. ?? From beginning of recorded time to 2003 - five exabytes of information. ?? 2011 - that much created every two days. ?? 2012 - prediction is every 10 minutes. Digital Kudzu
  • 10. ??SIEMs: never gets fully implemented. ??Predictions using Logistic Regression/Bayesian Probability. ??Huge amounts of data, not enough time. ??Open world problem using closed world assumptions. ??More staff, more money. Current Solutions
  • 11. the ability of our unconscious to find patterns in situations and behavior based on very narrow slices of experience. Malcolm Gladwell, Blink Alternative Model: Thin Slicing
  • 12. ??Cook County Hospital struggled with identifying patients in danger of an imminent heart attack. ??Coronary care unit was overwhelmed. ??Public hospital, limited resources. ??ICU is dangerous. Case Study: A Hospital in Trouble
  • 13. ??Lee Goldman, a cardiologist, created a protocol based upon an algorithm developed in partnership with mathematicians. ??After two years of using a decision tree, hospital staff were 70% more effective at recognizing patients at risk. ??Less information led to greater success. ??Technique used by first-responders every day. Applied Thin-Slicing
  • 14. Violations of logical reasoning [are] interpreted as cognitive fallacies, yet what appears to be a fallacy can often also be seen as adaptive behavior, if one is willing to rethink the norm. Gerd Gigerenzer, Rationality for Mortals Bounded Rationality
  • 15. (1997) found that, compared with a logistic regression model that uses eight cues simultaneously to make a decision, this FFT had a higher overall predictive accuracy, in addition to its advantages in Tree models of categorization and decision making have been studied in a variety of disciplines, such as medicine, applied statistics, computer science, and psychology (e.g., Breiman, Fried- Did prosecution request conditional bail or oppose bail? No or N.A. Nonpunitive Yes Punitive Did previous court impose conditions or remand in custody? Yes Punitive Did police impose conditions or remand in custody? Yes Punitive No or N.A. No or N.A. ST segment change? No Yes Coronary Care UnitChief complaint of chest pain? Regular Nursing Bed No Any other factor? (NTG, MI, ST?, ST, T) Regular Nursing Bed No Yes Coronary Care Unit Yes a b Figure 4. Two examples of fast-and-frugal trees (FFTs) applied to large world problems. The left tree (a) is designed to help emergency room doctors decide whether to send a patient with severe chest pain to the Coronary Care Unit (CCU) or a regular nursing bed (Green & Mehr, 1997). The right tree (b) is a model of how British judges decide whether to make a punitive bail decision (Dhami, 2003). 320 LUAN, SCHOOLER, AND GIGERENZER Fast and Frugal Trees
  • 16. ??Semantic Web technology. ??Queries based on relationships or mental associations. ??Graphs treat each packet from capture file as a discrete event with properties. ??TCP header info in a metadata model. ??Model replicates human cognitive economy. Method: Resource Description Framework (RDF)
  • 17. ??SPARQL query language uses a concise approach for quickly traversing large data sets while capturing similarities between packets as generalizations. ??RDF statement contains a subject, predicate and an object. ??Subject defines the event. ??Predicate defines a characteristic or property. ??Object contains the value for the predicate. Thin-Slicing with SPARQL
  • 18. sparql select * { ?s ?p ?o.}; sparql select *{ ?e1 <http://www.rrecktek.com/demo/src> ?ip1.}; Example: Building A Query
  • 19. ?? All source IPs and their destination IPs. ?? For each source, count how many times it went to a destination. ?? Report source destination and count. sparql SELECT ?src ?dst (count (?dst) as ?count) { ?e1 <http://www.rrecktek.com/demo/src> ?src. ?e1 <http://www.rrecktek.com/demo/dst> ?dst. } ORDER BY DESC (?count); Example
  • 20. Which machines were the destination of the most traffic? sparql select * { ?event <http://www.rrecktek.com/demo/dst> ?dst. } limit 10; sparql select distinct (?dst) (COUNT (?src) as ?count) { ?event <http://www.rrecktek.com/demo/dst> ?dst. ?event <http://www.rrecktek.com/demo/src> ?src. } ORDER BY DESC(?count) limit 10; ? Example 2
  • 21. What times did the machines talk to each other ? sparql select * { ?e <http://www.rrecktek.com/demo/src> "135.8.60.182". ?e <http://www.rrecktek.com/demo/dst> "172.16.113.50". ?e <http://www.rrecktek.com/demo/date> ?date. FILTER regex(?date, "1998-06-04"). ?e <http://www.rrecktek.com/demo/time> ?time }; ? ? Example 3
  • 23. ??What we can do ??Build strong infrastructures and secure applications minimizing technical debt. ??Create data classification schemes based upon the business and technical service catalogs to better create better segmentation. ??Add the equivalent of air bags to the architecture for when intrusions occur. ??Recognize signature limitations. ??Investigate the creation of real-time fast and frugal trees. Our patient is dying on the table. Its up to us to change the outcome. We Cant Fight All Unknowns
  • 25. "Eclectic Tech." Semantic Web Introduction. N.p., n.d. Web. 20 Dec. 2012. Erwin, Sandra I. "Too Much Information, Not Enough Intelligence." National Defense Magazine. N.p., May 2012. Web. <http://www.nationaldefense.org>. Gigerenzer, Gerd. Gut Feelings: The Intelligence of the Unconscious. New York: Viking, 2007. Print. Gigerenzer, Gerd. Rationality for Mortals: How People Cope with Uncertainty. Oxford: Oxford UP, 2008. Print. Gladwell, Malcolm. Blink: The Power of Thinking without Thinking. New York: Little, Brown and, 2005. Print. Hacker Intelligence Initiative, Monthly Trend Report #14. Rep. Imperva, Dec. 2012. Web. Dec. 2012. Luan, Shenghua, Lael J. Schooler, and Gerd Gigerenzer. "A Signal-detection Analysis of Fast-and-frugal Trees." Psychological Review 118.2 (2011): 316-38. Print. Marewski, Julian N., PhD, and Gerd Gigerenzer, PhD. "Heuristic Decision Making in Medicine." Dialogues in Clinical Neuroscience 14.1 (2012): 77-89. Print. Messmer, Ellen. "SANS Warns IT Groups Fail to Focus on Logs for Security Clues." TechWorld. IDG, May 2012. Web. "RDF." -Semantic Web Standards. W3C, n.d. Web. 02 Jan. 2013. "Resource Description Framework (RDF)Model and Syntax." RDF Model and Syntax. W3C, n.d. Web. 02 Jan. 2013. Rieland, Randy. "Big Data or Too Much Information?" Innovations. Smithsonian, 7 May 2012. Web. Sandoval, Greg. "Foreign Hackers Steal More Than a Terabyte of Data per Day in Ongoing Cyberwar." The Verge. N.p., 27 Feb. 2013. Web. 27 Feb. 2013. "Semantic Web Standards." W3C. W3C, n.d. Web. 02 Jan. 2013. Taleb, Nassim. The Black Swan: The Impact of the Highly Improbable. New York: Random House, 2007. Print. Turek, Dave. "The Case Against Digital Sprawl." The Management Blog. Bloomberg Businessweek, 2 May 2012. Web. Verizon 2012 Data Breach Investigation Report. Rep. N.p.: Verizon, n.d. Print. References