The document discusses optimization techniques for online experiments known as multi-armed bandits. It introduces the multi-armed bandit problem of balancing exploration of new options with exploitation of existing knowledge. It then describes several techniques for solving this problem, including AB testing, epsilon greedy, and upper confidence bound approaches. The document provides examples of how these techniques work and notes that multi-armed bandit methods can provide more efficient learning than AB testing alone.
1 of 40
Downloaded 64 times
More Related Content
Conductrics bandit basicsemetrics1016
1. Bandit Basics A Different
take on Online Optimization
Conductrics twitter: @mgershoff
2. Who is this guy?
Matt Gershoff
CEO: Conductrics
Many Years in Database Marketing (New York
and Paris)
and a bit of Web Analytics
www.conductrics.com
twitter:@mgershoff
Email:matt@conductrics.com
4. What Are We Going to Hear?
Optimization Basics
Multi-Armed Bandit
Its a Problem, Not a Method
Some Methods
AB Testing
Epsilon Greedy
Upper Confidence Interval (UCB)
Some Results
6. OPTIMIZATION
If THIS Then THAT
ITTT brings together:
1.Decision Rules
2.Predictive Analytics
3.Choice Optimization
Conductrics twitter: @mgershoff
7. OPTIMIZATION
Find and Apply the Rule with the most Value
If THIS Then THAT If THIS Then THAT
If THIS Then THAT If THIS Then THAT
If THIS Then THAT If THIS Then THAT
If THIS Then THAT If THIS Then THAT
If THIS Then THAT If THIS Then THAT
If THIS Then THAT If THIS Then THAT
Conductrics twitter: @mgershoff
8. OPTIMIZATION
Variables whose Values Variables whose Values
Are Given to You You Control
THIS THAT
If Facebook
High Spend
Urban GEO
.
Then Offer A
Offer B
Offer C
.
. Predictive Model .
. .
. F1
.
.
F2
S Valuei .
Home Page Fm
Offer Y
App Use Offer Z
Inputs Conductrics twitter: @mgershoff
Outputs
9. But
1. We Dont Have Data on THAT
2. Need to Collect Sample THAT
3. How to Sample Efficiently? Offer A ?
Offer B ?
Offer C ?
.
.
.
.
.
Offer Y ?
Offer Z ?
Conductrics twitter: @mgershoff
11. What is a Multi Armed Bandit
One Armed Bandit >Slot Machine
The problem:
How to pick between Slot Machines so that
you walk out with most $$$ from Casino at the
end of the Night?
OR
A B
Conductrics twitter: @mgershoff
12. Objective
Pick so as to get the most
return/profit as you can
over time
Technical term: Minimize Regret
Conductrics twitter: @mgershoff
13. Sequential Selection
but how to Pick? OR
A B
Need to Sample, but do it efficiently
Conductrics twitter: @mgershoff
14. Explore Collect Data
OR
A B
Data Collection is costly an Investment
Be Efficient Balance the potential value of
collecting new data with exploiting what
you currently know.
Conductrics twitter: @mgershoff
15. Multi-Armed Bandits
Bandit problems embody in essential
form a conflict evident in all human
action: choosing actions which yield
immediate reward vs. choosing actions
whose benefit will come only later.*
- Peter Whittle
*Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010.
Conductrics twitter: @mgershoff
16. Exploration Exploitation
1) Explore/Learn Try out different actions
to learn how they perform over time This is
a data collection task.
2) Exploit/Earn Take advantage of what
you have learned to get highest payoff
Your current best guess
Conductrics twitter: @mgershoff
17. Not A New Problem
1933 first work on competing
options
1940 WWII Problem Allies
attempt to tackle
1953 Bellman formulates as a
Dynamic Programing problem
Source: http://www.lancs.ac.uk/~winterh/GRhist.html
Conductrics twitter: @mgershoff
18. Testing
Explore First
All actions have an equal chance of
selection (uniform random).
Use hypothesis testing to select a
Winner.
Then Exploit - Keep only Winner
for selection
Conductrics twitter: @mgershoff
20. P-Values: A Digression
P-Values:
NOT the probability that the Null is
True. P( Null=True| DATA)
P(DATA (or more extreme)| Null=True)
Not a great tool for deciding when to
stop sampling
See:
http://andrewgelman.com/2010/09/noooooooooooooo_1/
http://www.stat.duke.edu/~berger/papers/02-01.pdf
Conductrics twitter: @mgershoff
21. A Couple Other Methods
1. Epsilon Greedy
Nice and Simple
2. Upper Confidence Bounds(UCB)
Adapts to Uncertainty
Conductrics twitter: @mgershoff
23. Greedy
What do you mean by Greedy?
Make whatever choice seems
best at the moment.
24. Epsilon Greedy
What do you mean by Epsilon
Greedy?
Explore randomly select action
percent of the time (say 20%)
Exploit Play greedy (pick the
current best) 1- (say 80%)
25. Epsilon Greedy
User
Explore/Learn Exploit/Earn
(20%) (80%)
Select Select Current
Randomly Best
Like AB Testing (Be Greedy)
Conductrics twitter: @mgershoff
26. Epsilon Greedy
20% Random 80% Select Best
Action Value
A $5.00
B $4.00
C $3.00
D $2.00
E $1.00
Conductrics twitter: @mgershoff
27. Continuous Sampling
Explore/Learn
Exploit/Earn
Time
Conductrics twitter: @mgershoff
28. Epsilon Greedy
Super Simple/low cost to implement
Tends to be surprisingly effective
Less affected by Seasonality
Not optimal (hard to pick best )
Doesnt use measure of variance
Should/How to decrease Exploration over
time? Conductrics twitter: @mgershoff
29. Upper Confidence Bound
Basic Idea:
1) Calculate both mean and a measure
of uncertainty (variance) for each
action.
2) Make Greedy selections based on
mean + uncertainty bonus
Conductrics twitter: @mgershoff
31. Upper Confidence
Score each option using the upper
portion of the interval as a Bonus
Mean +Bonus
Conductrics twitter: @mgershoff
32. Upper Confidence Bound
1) Use upper portion of CI as Bonus
2) Make Greedy Selections
A Select A
B
C
$0 $5 $10
Estimated Reward
Conductrics twitter: @mgershoff
33. Upper Confidence Bound
1) Selecting Action A reduces uncertainty
bonus (because more data)
2) Action C now has highest score
A
B
C Select C
$0 $5 $10
Estimated Reward
Conductrics twitter: @mgershoff
34. Upper Confidence Bound
Like A/B Test uses variance
measure
Unlike A/B Test no hypothesis
test
Automatically Balances
Exploration with Exploitation
Conductrics twitter: @mgershoff