ݺߣ

ݺߣShare a Scribd company logo
SEIZE THE DATA. 2015
PredictingCyberSecurityIndustry
HP Security Briefing 21
John Park/ August 11th, 2015
It all starts with a simple question.
“What is happening with
computer security ?”
First solution:
“READ EVERYTHING.”
HPBigData2015Predicting Cyber Security Industry-JohnPark
After 5000 hours of reading…
HPBigData2015Predicting Cyber Security Industry-JohnPark
Lesson #1: There is more info than one
can read in a lifetime.
Lesson #2: The more one reads, the
more it sounds the same.
Second solution:
“USE MACHINES.”
Natural Language Processing (NLP)
Word-smithing
Mass Media 101: Journalism
The more important it is,
the more it’s talked about.
Mass Media 102: Advertising
The more it’s talked about,
the more important it becomes.
Mid-way: questions?
Game Plan:
Count the number of words
and sequences of words (n-grams).
Data tip #1:
Start with a high-quality dataset.
HPBigData2015Predicting Cyber Security Industry-JohnPark
HPSR Cyber Risk Report 2015
NLP tip #1: Word (1-gram) frequency
shows the ingredients.
NLP tip #2: Word pair (2-grams) are
intersections. More niche.
NLP tip #3: 3+ grams are super-niche.
May be too unique.
NLP tip #4: Stemming
lowercase + substr(word,0,7)
NLP tip #5: Stop-words (is, the, ...)
Normalize against general text.
(go over the top 300 manually)
NLP tip #6:
If timeline analysis,
balance the “before” and the “after”.
NLP tip #7:
It’s always best to have final
human verification.
Use the common sense.
Find similar exploiters.
Without further ado,
let’s see some results.
Result #1: Top 5 words (2013+2014)
1. Malware
2. Security
3. Attack
4. System
5. Exploits
Power Law/Long-Tail
Result #2: Top 5 n-grams (2013+2014)
1. Operating System
2. Targeted Attack
3. Exploit Kits
4. United States
5. Social Engineering
Result #3: Security conferences
Black Hat
1. attack
2. security
3. presentation
4. system
5. talk
Def Con
1. security
2. talk
3. attack
4. network
5. hackers
Virus Bulletin
1. malware
2. system
3. security
4. app
5. detection
Result #4: National mentions
1. United States
2. Russia
3. China
4. Germany
5. Brazil
One more thing…
What we really want:
Predictions
Prediction #1: Word frequency 2015
(extrapolated from 2013 + 2014)
1. Security (2)
2. Attacks (3)
3. Malware (1)
4. System (4)
5. Data (6)
Prediction #1: n-gram frequency 2015
(extrapolated from 2013 + 2014)
1. Operating System (1)
2. Malware Family (9)
3. Exploit Kits (3)
4. Targeted Attacks (2)
5. Cyber Security (19)
And another thing…
What we really want:
Competitive Analysis
What are other companies in the industry
working on?
Microsoft
1. computers
2. detected
3. exploits
4. Microsoft
5. malware
FireEye
1. attack
2. malware
3. targeted
4. FireEye
5. verticals
Verizon
1. data
2. incidents
3. breaches
4. attacks
5. organizations
More data on
HP Security Briefing 21:
Security is about defending the System
against Attacks, that start with Exploits
and controlled by Malware.
SEIZE THE DATA. 2015

More Related Content

HPBigData2015Predicting Cyber Security Industry-JohnPark