HP Security Briefing 21
John Park/ August 11th, 2015
It all starts with a simple question.
What is happening with
computer security ?
First solution:
After 5000 hours of reading
Lesson #1: There is more info than one
can read in a lifetime.
Lesson #2: The more one reads, the
more it sounds the same.
Second solution:
Natural Language Processing (NLP)
Mass Media 101: Journalism
The more important it is,
the more its talked about.
Mass Media 102: Advertising
The more its talked about,
the more important it becomes.
Mid-way: questions?
Game Plan:
Count the number of words
and sequences of words (n-grams).
Data tip #1:
Start with a high-quality dataset.
HPSR Cyber Risk Report 2015
NLP tip #1: Word (1-gram) frequency
shows the ingredients.
NLP tip #2: Word pair (2-grams) are
intersections. More niche.
NLP tip #3: 3+ grams are super-niche.
May be too unique.
NLP tip #4: Stemming
lowercase + substr(word,0,7)
NLP tip #5: Stop-words (is, the, ...)
Normalize against general text.
(go over the top 300 manually)
NLP tip #6:
If timeline analysis,
balance the before and the after.
NLP tip #7:
Its always best to have final
human verification.
Use the common sense.
Find similar exploiters.
Without further ado,
lets see some results.
Result #1: Top 5 words (2013+2014)
1. Malware
2. Security
3. Attack
4. System
5. Exploits
Power Law/Long-Tail
Result #2: Top 5 n-grams (2013+2014)
1. Operating System
2. Targeted Attack
3. Exploit Kits
4. United States
5. Social Engineering
Result #3: Security conferences
Black Hat
1. attack
2. security
3. presentation
4. system
5. talk
Def Con
1. security
2. talk
3. attack
4. network
5. hackers
Virus Bulletin
1. malware
2. system
3. security
4. app
5. detection
Result #4: National mentions
1. United States
2. Russia
3. China
4. Germany
5. Brazil
One more thing
What we really want:
Prediction #1: Word frequency 2015
(extrapolated from 2013 + 2014)
1. Security (2)
2. Attacks (3)
3. Malware (1)
4. System (4)
5. Data (6)
Prediction #1: n-gram frequency 2015
(extrapolated from 2013 + 2014)
1. Operating System (1)
2. Malware Family (9)
3. Exploit Kits (3)
4. Targeted Attacks (2)
5. Cyber Security (19)
And another thing
What we really want:
Competitive Analysis
What are other companies in the industry
working on?
1. computers
2. detected
3. exploits
4. Microsoft
5. malware
1. attack
2. malware
3. targeted
4. FireEye
5. verticals
1. data
2. incidents
3. breaches
4. attacks
5. organizations
More data on
HP Security Briefing 21:
Security is about defending the System
against Attacks, that start with Exploits
and controlled by Malware.

HPBigData2015Predicting Cyber Security Industry-JohnPark