�ݺ�ߣ

ADAPTIVE HONEYPOT ENGAGEMENT
LINAN HUANG AND QUANYAN ZHU
NEW YORK UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
IOC TO THREAT INTELLIGENCE
? Reactive defense uses Indicators of Compromise.
? Proactive defense relies on threat intelligence.
Effectiveness:
defenders acquire
more threat
information.
Stability:
attackers suffer
more pains to
adapt to the
defense
mechanism.
Difficulty:
hard to obtain via
traditional defense
techniques.
Indicator of
Compromise
Threat
Intelligence
Evidence
left
during or
after the
attack
How to
launch the
attack ?
Who are they?
What do they
want?
Organization/
Personnel
Events/Goal
TTPs
Tools
Network/Host Artifacts
Domain Names
IP Address
Hash Values
INTELLIGENCE VIA HONEYPOTS
? Use a honeynet to emulate a production system.
? Interact with rather than directly eject attackers.
? Quickly attract attackers to target honeypots and
engage them for a desired time.
? Grant attackers proper degree of freedom to avoid
the escape risk and the identi?cation risk.
Access Point
Internet / Cloud
Firewall
SwitchSwitch
Access Point
Internet / Cloud
Intrusion
Detection
Honeypot
192.168.1.10
Honeywall
Gateway
Router
Server
Honeypot
192.168.1.45
Data Base
Computer Network
Server
Work Station
192.168.1.55
Data Base
192.168.1.90
Honeywall
SensorActuator
Honeypot
Honeypot Network
HoneypotHoneypot
Honeynet Production Systems
OPTIMAL ENGAGEMENT STRATEGY
? Urgent needs to ?nd cost-effective, time-ef?cient,
and risk-averse engagement strategies that adapt to
unknown or evolving attack models.
Clients
Server
Switch
Normal Zone
Computer
Network
Emulated
Sensors
Emulated
Database
12
1110
1
2
345
67
9
8
13
Absorbing
State
? State to represent the
attacker��s location at
honeypot nodes, the
normal zone, and the
absorbing state.
? Actions to engage, at-
tract, or eject attackers
ATTACKER��S FOOTPRINTS
2.4899 2.4994 2.5089 2.5184 2.5279 2.5374
Time 104
1
2
3
4
5
6
7
8
9
10
11
12
13
State
? Treat the transition
kernel and the sojourn
distribution as threat
intelligence.
? Characterize the es-
cape risk and the iden-
ti?cation risk.
OPTIMAL LONG-TERM POLICY
? The long-term engagement reward u(s0
, ��) is
E
��
k=0
T k+1
T k
e?��(��+T k
)
r(Sk
, Ak
, Sk+1
, Tk
, Tk+1
, ��)d�� .
? The dynamic programming representation shows
the contraction-mapping property and results in a
unique optimal policy:
v(s0
) = sup
a0��A(s0)
E[
T 1
T 0
e?��(��+T 0
)
r(s0
, a0
, S1
, T0
, T1
, ��)d�� + e?��T 1
v(S1
)].
? A regulation condition avoids in?nite transitions
within a ?nite time.
T 4
State
T 3T 2T 10
1
2
3
N+2
N+1
Time
REINFORCEMENT LEARNING: FIND POLICIES THAT ADAPT TO UNKNOWN OR EVOLVING MODELS
? The exact attack model is unknown or evolving.
�C sample the investigation reward.
�C sample the attacker��s transition probability.
�C sample the sojourn distribution.
? Defenders learn the engagement policy based on ac-
tual honeypot interactions: update Qk+1
(sk
, ak
) as
(1 ? ��k
(sk
, ak
))Qk
(sk
, ak
) + ��k
(sk
, ak
)[?r1(sk
, ak
, ?sk+1
)
+ ?r2(sk
, ak
)
(1 ? e?��?��k
)
��
? e?��?��k
max
a ��A(?sk+1)
Qk
(?sk+1
, a )].
? Learning rate ��k
(sk
, ak
) = kc
k{sk,ak}
?1+kc
guarantees
asymptotic convergence.
�C kc �� (0, ��) is a constant parameter.
�C k{sk,ak} �� {0, 1, �� } is the number of visits to
state-action pair {sk
, ak
} up to stage k.
0 1 2 3 4 5 6 7
Step k 10
4
Value
0 100 200 300 400 500 600 700 800 900 1000
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
Variance Mean Theoretical Value
? Defenders need to choose a proper learning rate for
a quicker and better performance.
? The increase in the number of samples reduces the
variance and the error of the mean.
Challenges for Learning in Honeypot Engagement:
? Non-cooperative learning environment:
�C In the classical RL task, the learner may choose
to start at any state at any time, and repeatedly
simulate the path from the target state.
�C The defender can eject attackers but cannot arbi-
trarily draw them to the target honeypot.
? Risk reduction during the learning period:
�C Defenders need to concern system safety and en-
gagement performance during real interactions.
? Asymptotic versus ?nite-step convergence:
�C Since an attacker can terminate the interaction on
his own, the engagement time may be limited.
SECURITY METRICS TO EVALUATE ENGAGEMENT STRATEGIES
0 5 10 15 20 25
Time
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probability
1: Swtich
2: Server
10: Database
12: Normal Zone
1: Swtich
2: Server
3
4
5
6
7
8
9
10: Database
11: Sensor
12: Normal Zone
12%
10%
1%
2%
1%
3%
3%
11%
3%
41%
4%
9%
? How attractive is the honeynet (or speci?c honey-
pot nodes) if the attacker is in the normal zone?
? How likely will the attacker in a honeypot node
visit the normal zone at a given time?
? How does the likelihood evolve?
? Attraction ef?ciency is the time to attract the at-
tacker from the normal zone to target honeypots.
? Absolutely safe engagement is the engagement time
before the attacker��s ?rst escape.
? Random variable TiD is the time of the ?rst visit to
a region D ? S with initial location i �� S D.
? Average value tm
iD = E[TiD] provides a uni?ed mea-
sure for the ef?ciency and safety of the engagement.
? Diffusion: More jumps result in a longer time.
? Asymmetry structure: the attraction time (from the
normal zone to the honeypot) is longer than the en-
gagement time (from the honeypot to the normal
zone).
0
0.2
0.4
Probability
Stationary Probability of Normal Zone
-3
-2
-1
Value
Utility of Normal Zone
0 0.5 1 1.5 2 2.5
Value of
6
8
10
Value
Expected Utility over Stationary Probability
0
0.5
1
Probability
Stationary Probability of Normal Zone
-3.5
-3
-2.5
-2
Value
Utility of Normal Zone
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability of Failed Attraction
0
5
10
Value
Expected Utility over Stationary Probability
? A larger �� less persistent: it requires less time to
attract the attacker away from the normal zone.
? A smaller p �� less intelligent: the attraction is less
likely to fail.
? Performances degrade only at extreme cases.
? Our policy is robust against a wide variation of the
attacker��s persistence and intelligence.

�ݺ�ߣ

Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

Recommended

More Related Content

Similar to Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes (20)

Recently uploaded (20)

Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes