際際滷

際際滷Share a Scribd company logo
PREDICTING
EXPLOITABILITY
@MROYTMAN
Prediction is very difficult, especially
about the future.
-Niels Bohr
3 Types of Data-Driven
Retrospective
Analysis &
Reporting
Here-and-now
Real-time processing
& dashboards
Predictions
To enable smart
applications
Too many vulnerabilities.
How do we derive risk from
vulnerability in a data-driven manner?
Problem
Exploitability
1. RETROSPECTIVE
2. REAL-TIME
3. PREDICTIVE
1. RETROSPECTIVE
2. REAL-TIME
3. PREDICTIVE
Exploitability
Analyst Input
Vulnerability Management
Programs Augmenting Data
Retrospective - Model: CVSS
Temporal Score Estimation
Vulnerability
Researchers
1. RETROSPECTIVE
2. REAL-TIME
3. PREDICTIVE
Exploitability
Real-Time - The Data
Vulnerability Scans (Qualys, Rapid7, Nessus, etc):
 7,000,000 Assets (desktops, servers, urls, ips, macaddresses)
 1,400,000,000 Vulnerabilities (unique asset/CVE pairs)
Exploit Intelligence - Successful Exploitations
 ReversingLabs backend metadata
Hashes for each CVE
Number of found pieces of malware corresponding to each hash
 Alienvault Backdoor
≒attempted exploits correlated with open vulnerabilities
ATTACKERS
ARE FAST
Cumulative Probability of Exploitation
O'Reilly Security New York - Predicting Exploitability Final
0 5 10 15 20 25 30 35 40
CVSS*10
EDB
MSP
EDB+MSP
Breach*Probability*(%)
Positive Predictive Value of remediating a
vulnerability with property X:
Q: Of my current vulnerabilities, which ones should I
remediate?
A: Old ones with stable, weaponized exploits
Data of Future Past
Q: A new vulnerability was just released.
Do we scramble?
A:
Future of Data Past
1. RETROSPECTIVE
2. REAL-TIME
3. PREDICTIVE
Exploitability
O'Reilly Security New York - Predicting Exploitability Final
Classifier = Will a vulnerability have an exploit
written and published?
[YES OR NO]
Enter:
AWS ML
N = 81303All CVE. Described By:
1. National Vulnerability Database
2. Common Platform Enumeration
3. Occurrences in Kenna Scan Data
Labelled as Exploited/Not Exploited:
1. Exploit DB
2. Metasploit
3. D2 Elliot
4. Canvas
5. Blackhat Exploit Kits
Predictive - The Data
70% Training, 30% Evaluation Split N = 81303
L2 regularizer
1 gb
100 passes over the data
Receiver operating
characteristics for comparisons
All Models
Distribution is not uniform. 77% of dataset is not exploited
1. Accuracy of 77% would be bad
Precision matters more than Recall
1. No one would use this model absent actual exploit available data.
2. False Negatives matter less than false positives - wasted effort
Predictive - The Expectations
We are not modeling when something will be exploited, just IF
1. Could be tomorrow or in 6 months. Re-run the model every day
Model 1: Baseline
-CVSS Base
-CVSS Temporal
-Remote Code Execution
-Availability
-Integrity
-Confidentiality
-Authentication
-Access Complexity
-Access Vector
-Publication Date
LMGTFY:
Moar Simple?
Sample Bad Chart Sample Good Chart
Model 2: Patches
-CVSS Base
-CVSS Temporal
-Remote Code Execution
-Availability
-Integrity
-Confidentiality
-Authentication
-Access Complexity
-Access Vector
-Publication Date
-Patch Exists
Model 3: Affected Software
-CVSS Base
-CVSS Temporal
-Remote Code Execution
-Availability
-Integrity
-Confidentiality
-Authentication
-Access Complexity
-Access Vector
-Publication Date
-Patch Exists
-Vendors
-Products
Model 4: Words!
-CVSS Base
-CVSS Temporal
-Remote Code Execution
-Availability
-Integrity
-Confidentiality
-Authentication
-Access Complexity
-Access Vector
-Publication Date
-Patch Exists
-Vendors
-Products
-Description, Ngrams 1-5
Model 5: Vulnerability Prevalence
-CVSS Base
-CVSS Temporal
-Remote Code Execution
-Availability
-Integrity
-Confidentiality
-Authentication
-Access Complexity
-Access Vector
-Publication Date
-Patch Exists
-Vendors
-Products
-Description, Ngrams 1-5
-Vulnerability Prevalence
-Number of References
Model 6: Overfitting Fixes
-2015 onward
-Exclude:
in the wild
as seen
exploited
Exploitability
-Track Predictions
vs. Real Exploits
-Integrate 20+
BlackHat Exploit
Kits - FP reduction?
-Find better vulnerability
descriptions - mine
advisories for content?
FN reduction?
-Predict Breaches,
not Exploits
-Attempt Models by
Vendor
Future Work
Too many vulnerabilities.
How do we derive risk from
vulnerability in a data-driven manner?
Problem
1. Gather data about known successful attack
paths
2. Issue forecasts where data is lacking in
order to predict new exploits
3. Gather MORE data about known successful
attack paths
Solution
These will have exploits in 2017
CVE-2016-0959
Released: 6/28/2017
No Exploit in ExploitDB, Metasploit, or Canvas
Flash buffer overflow, predicted that an exploit will be
released
O'Reilly Security New York - Predicting Exploitability Final
Thanks!
@MROYTMAN

More Related Content

O'Reilly Security New York - Predicting Exploitability Final

  • 2. Prediction is very difficult, especially about the future. -Niels Bohr
  • 3. 3 Types of Data-Driven Retrospective Analysis & Reporting Here-and-now Real-time processing & dashboards Predictions To enable smart applications
  • 4. Too many vulnerabilities. How do we derive risk from vulnerability in a data-driven manner? Problem
  • 6. 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE Exploitability
  • 7. Analyst Input Vulnerability Management Programs Augmenting Data Retrospective - Model: CVSS Temporal Score Estimation Vulnerability Researchers
  • 8. 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE Exploitability
  • 9. Real-Time - The Data Vulnerability Scans (Qualys, Rapid7, Nessus, etc): 7,000,000 Assets (desktops, servers, urls, ips, macaddresses) 1,400,000,000 Vulnerabilities (unique asset/CVE pairs) Exploit Intelligence - Successful Exploitations ReversingLabs backend metadata Hashes for each CVE Number of found pieces of malware corresponding to each hash Alienvault Backdoor ≒attempted exploits correlated with open vulnerabilities
  • 12. 0 5 10 15 20 25 30 35 40 CVSS*10 EDB MSP EDB+MSP Breach*Probability*(%) Positive Predictive Value of remediating a vulnerability with property X:
  • 13. Q: Of my current vulnerabilities, which ones should I remediate? A: Old ones with stable, weaponized exploits Data of Future Past
  • 14. Q: A new vulnerability was just released. Do we scramble? A: Future of Data Past
  • 15. 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE Exploitability
  • 17. Classifier = Will a vulnerability have an exploit written and published? [YES OR NO]
  • 19. N = 81303All CVE. Described By: 1. National Vulnerability Database 2. Common Platform Enumeration 3. Occurrences in Kenna Scan Data Labelled as Exploited/Not Exploited: 1. Exploit DB 2. Metasploit 3. D2 Elliot 4. Canvas 5. Blackhat Exploit Kits Predictive - The Data
  • 20. 70% Training, 30% Evaluation Split N = 81303 L2 regularizer 1 gb 100 passes over the data Receiver operating characteristics for comparisons All Models
  • 21. Distribution is not uniform. 77% of dataset is not exploited 1. Accuracy of 77% would be bad Precision matters more than Recall 1. No one would use this model absent actual exploit available data. 2. False Negatives matter less than false positives - wasted effort Predictive - The Expectations We are not modeling when something will be exploited, just IF 1. Could be tomorrow or in 6 months. Re-run the model every day
  • 22. Model 1: Baseline -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date
  • 24. Moar Simple? Sample Bad Chart Sample Good Chart
  • 25. Model 2: Patches -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists
  • 26. Model 3: Affected Software -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists -Vendors -Products
  • 27. Model 4: Words! -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists -Vendors -Products -Description, Ngrams 1-5
  • 28. Model 5: Vulnerability Prevalence -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists -Vendors -Products -Description, Ngrams 1-5 -Vulnerability Prevalence -Number of References
  • 29. Model 6: Overfitting Fixes -2015 onward -Exclude: in the wild as seen exploited
  • 31. -Track Predictions vs. Real Exploits -Integrate 20+ BlackHat Exploit Kits - FP reduction? -Find better vulnerability descriptions - mine advisories for content? FN reduction? -Predict Breaches, not Exploits -Attempt Models by Vendor Future Work
  • 32. Too many vulnerabilities. How do we derive risk from vulnerability in a data-driven manner? Problem
  • 33. 1. Gather data about known successful attack paths 2. Issue forecasts where data is lacking in order to predict new exploits 3. Gather MORE data about known successful attack paths Solution
  • 34. These will have exploits in 2017 CVE-2016-0959 Released: 6/28/2017 No Exploit in ExploitDB, Metasploit, or Canvas Flash buffer overflow, predicted that an exploit will be released