VxClass is a malware classification tool that analyzes binary code samples through structural comparison rather than byte-by-byte matching. It automatically unpacks samples using emulation, extracts structural information, compares samples to known malware families, and stores results in a database for correlation and visualization. This allows unknown samples to be categorized even if they were obtained or obfuscated differently, aiding incident response and malware analysis.
2. Introduction
Binary code is often left behind by attackers
Running processes
Dropped executables
Kernel memory snapshots
Network traffic
Crash dumps
3. Introduction
Useful evidence but difficult to analyze
Current methods:
Use AV scanner
Run executable to provoke/observe behavior
Remove packer/obfuscator code
Manual analysis using IDA Pro
4. Current Methods
Error-prone and time-consuming
AV signatures are brittle, out of date
Behavior can be difficult to provoke
Removal of protection code is difficult
Manual analysis
Does not scale
No easy correlation of results
5. VxClass
Structural malware classification tool
Categorizes malware samples into families
Groups malware that shares code
Allows correlation between samples
Regardless of how they were obtained
6. VxClass
Upload of samples through a web server
Generic unpacking through emulation
Extraction of structural information
Comparison with known samples
Storage of the results in a SQL database
Visualization of the results in the browser
7. Uploading
Upload samples
Through a web interface in your browser
Through XML-RPC
User-based access control to samples:
Public: All users can see and download
Limited: All users can see, but not download
Private: Only original uploader can see and download
9. Unpacking
Generic unpacking is difficult
Anti-debugging tricks
Attempts to foil emulators
Creation of and interaction between multiple
processes
Code obfuscation
10. Unpacking
Our approach: Full system emulation
Emulated Windows XP SP2 in Bochs
Run the executable until it looks unpacked
Aquire memory of all processes and dirty
kernel pages
Use code in aquired memory for classification
11. Unpacking
Solved problems
Anti-Debugging tricks
Legacy API calls
Multiple processes
Interprocess communication
Kernel memory analysis
Result: Most packers can be unpacked
automatically
12. Comparison
Problem: Meaningful comparison of binary
code
Byte-by-byte comparison is useless
Our approach: Structural comparison
Award-winning (German IT-Security Award 2006)
Uses industry-standard BinDiff engine
Uses patent-pending MD-Index (more later)
13. Structural Comparison
Extract call graph and flow graph information
from samples
Compare the structure of these graphs instead
of byte sequences
Compares code derived from same source
Regardless of compiler settings
Regardless of compiler
17. MD-Index
Patent-pending
Clever hash function for directed graphs
Assigns 80-bit value to a directed graph
Allows keeping a database of flow graphs
Allows efficient queries into the database
Is used within VxClass for several purposes:
Very fast approximate comparison
Code search
18. Results
Memory dumps and recovered strings
IDA files (IDB) of the resulting disassemblies
Pairwise similarity scores
Visualisation:
Family trees
Top-10-most-similar list
23. Case Studies
Noise reduction
Automatically filter uninteresting samples
Knowledge management
Share information between analysts
Attacker Correlation
Is a set of attacks performed with one toolset
Code searching
Find certain functions in known samples
24. Noise Reduction
Upload new files to the system
How similar are they to interesting samples ?
Comparison to database of known samples
Prioritize accordingly
25. Knowledge management
Each analyst uploads samples he knows to
VxClass
New malware comes in, gets uploaded
VxClass determines which known samples this
is similar to
The expert for similar samples can be found
26. Attacker Correlation
A series of incidents is investigated
On a large number of machines, code is found
Classify the code using VxClass to find out:
Is this one group of attackers ?
Is this similar to attacks seen in the past ?
27. Code Searching
A particularly strange piece of code (just one
function) is identified
Perhaps a strange encryption function
Does this particular piece of code appear in
other samples in the database ?
Search is not byte-based, but flow graph
based (MD-Index)
The answer is one click away
28. Performance
One VxClass machine
800-1600 samples per day
Performance depends on
Obfuscation complexity
Size of the malware
Size of the database
Can be fully parallelized
The only bottleneck is the central database
29. Behavioral Analysis
VxClass is not a behavioral-analysis tool
VxClass is complementary to such tools
We recommend combining VxClass with
behavior-monitoring tools such as
CWSandbox (http://www.cwsandbox.org)
Anubis (Free) (http://anubis.iseclabs.org)
30. VxClass Options
VxClass on a single machine
Run it inside your organisation
VxClass distributed
Scale it to your needs
VxClass as service
We host a machine for you
VxClass as shared service
We host a machine for you
Multiple clients use a shared database
31. Existing Customers
The German BSI
Agency for security in information systems
Vodafone Germany
Pre-filters Symbian/ARM executables
Other government entities and private
companies
Mostly used for attacker correlation and noise
filtering
32. Limitations
Heavy obfuscation of control flow
Virtualizing packers
Unpacking only works on 32-bit Windows
No Linux / OSX / Mobile unpacking
64 bit support is in the works
Upload of IDBs allows heavy manual
intervention beforehand
33. FAQ
What OS does it run on ?
It runs on a 64-bit Debian Lenny install
Does it have any network dependencies ?
No
How can we extend the system ?
All generated data is accessible through XML-RPC
If needed, direct access to the SQL can be used
The SQL schema is available on request