Security is all about reacting. Its time to make some predictions. Michael Roytman explains how Kenna Security used the AWS Machine Learning platform to train a binary classifier for vulnerabilities, allowing the company to predict whether or not a vulnerability will become exploitable.
Michael offers an overview of the process. Kenna enriches the data with more specific, nondefinitional-level data. 500 million live vulnerabilities and their associated close rates inform the epidemiological data, as well as in the wild threat data from AlienVaults OTX and SecureWorkss CTU, Reversing Labs, and ISC SANS. The company uses 70% of the national vulnerability database as its training dataset and generates over 20,000 predictions on the remainder of the vulnerabilities. It then measures specificity and sensitivity, positive predictive value, and false positive and false negative rates before arriving at an optimal decision cutoff for the problem.
1 of 36
Download to read offline
More Related Content
O'Reilly Security New York - Predicting Exploitability Final
9. Real-Time - The Data
Vulnerability Scans (Qualys, Rapid7, Nessus, etc):
7,000,000 Assets (desktops, servers, urls, ips, macaddresses)
1,400,000,000 Vulnerabilities (unique asset/CVE pairs)
Exploit Intelligence - Successful Exploitations
ReversingLabs backend metadata
Hashes for each CVE
Number of found pieces of malware corresponding to each hash
Alienvault Backdoor
≒attempted exploits correlated with open vulnerabilities
19. N = 81303All CVE. Described By:
1. National Vulnerability Database
2. Common Platform Enumeration
3. Occurrences in Kenna Scan Data
Labelled as Exploited/Not Exploited:
1. Exploit DB
2. Metasploit
3. D2 Elliot
4. Canvas
5. Blackhat Exploit Kits
Predictive - The Data
20. 70% Training, 30% Evaluation Split N = 81303
L2 regularizer
1 gb
100 passes over the data
Receiver operating
characteristics for comparisons
All Models
21. Distribution is not uniform. 77% of dataset is not exploited
1. Accuracy of 77% would be bad
Precision matters more than Recall
1. No one would use this model absent actual exploit available data.
2. False Negatives matter less than false positives - wasted effort
Predictive - The Expectations
We are not modeling when something will be exploited, just IF
1. Could be tomorrow or in 6 months. Re-run the model every day
22. Model 1: Baseline
-CVSS Base
-CVSS Temporal
-Remote Code Execution
-Availability
-Integrity
-Confidentiality
-Authentication
-Access Complexity
-Access Vector
-Publication Date
33. 1. Gather data about known successful attack
paths
2. Issue forecasts where data is lacking in
order to predict new exploits
3. Gather MORE data about known successful
attack paths
Solution
34. These will have exploits in 2017
CVE-2016-0959
Released: 6/28/2017
No Exploit in ExploitDB, Metasploit, or Canvas
Flash buffer overflow, predicted that an exploit will be
released