際際滷

際際滷Share a Scribd company logo
Bandit Basics  A Different
take on Online Optimization




            Conductrics twitter: @mgershoff
Who is this guy?

Matt Gershoff
CEO: Conductrics
Many Years in Database Marketing (New York
and Paris)
and a bit of Web Analytics

www.conductrics.com
twitter:@mgershoff
Email:matt@conductrics.com
Speak Up




Conductrics twitter: @mgershoff
What Are We Going to Hear?
 Optimization Basics
 Multi-Armed Bandit
   Its a Problem, Not a Method
   Some Methods
     AB Testing
     Epsilon Greedy
     Upper Confidence Interval (UCB)
 Some Results
Choices                         Targeting




Learning                         Optimization




           Conductrics twitter: @mgershoff
OPTIMIZATION


If THIS Then THAT
ITTT brings together:
1.Decision Rules
2.Predictive Analytics
3.Choice Optimization
             Conductrics twitter: @mgershoff
OPTIMIZATION
Find and Apply the Rule with the most Value

  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT



                 Conductrics twitter: @mgershoff
OPTIMIZATION
Variables whose Values                      Variables whose Values
   Are Given to You                              You Control
        THIS                                                    THAT

If    Facebook
      High Spend
      Urban GEO
      .
                   Then                                        Offer A
                                                               Offer B
                                                               Offer C
                                                               .
      .                 Predictive Model                       .
      .                                                        .
      .            F1
                                                               .
      .
                   F2
                                             S        Valuei   .
      Home Page    Fm
                                                               Offer Y
      App Use                                                  Offer Z

       Inputs           Conductrics twitter: @mgershoff
                                                               Outputs
But 
1. We Dont Have Data on THAT
2. Need to Collect  Sample     THAT
3. How to Sample Efficiently?   Offer A ?
                                                   Offer B ?
                                                   Offer C ?
                                                   .
                                                   .
                                                   .
                                                   .
                                                   .
                                                   Offer Y ?
                                                   Offer Z ?


                 Conductrics twitter: @mgershoff
Where
Marketing Applications:
 Websites
 Mobile
 Social Media Campaigns
 Banner Ads
Pharma: Clinical Trials
          Conductrics twitter: @mgershoff
What is a Multi Armed Bandit
One Armed Bandit >Slot Machine
The problem:
How to pick between Slot Machines so that
you walk out with most $$$ from Casino at the
end of the Night?


                                  OR



               A                             B
                   Conductrics twitter: @mgershoff
Objective

Pick so as to get the most
return/profit as you can
over time
Technical term: Minimize Regret


             Conductrics twitter: @mgershoff
Sequential Selection
 but how to Pick?                              OR


                                            A        B




Need to Sample, but do it efficiently


              Conductrics twitter: @mgershoff
Explore  Collect Data

                              OR

                    A                B


 Data Collection is costly  an Investment
 Be Efficient  Balance the potential value of
  collecting new data with exploiting what
  you currently know.

                 Conductrics twitter: @mgershoff
Multi-Armed Bandits

Bandit problems embody in essential
form a conflict evident in all human
action: choosing actions which yield
immediate reward vs. choosing actions
 whose benefit will come only later.*
- Peter Whittle

 *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010.

                           Conductrics twitter: @mgershoff
Exploration Exploitation


1) Explore/Learn  Try out different actions
to learn how they perform over time  This is
a data collection task.

2) Exploit/Earn  Take advantage of what
you have learned to get highest payoff 
Your current best guess
                Conductrics twitter: @mgershoff
Not A New Problem
1933  first work on competing
 options
1940  WWII Problem Allies
 attempt to tackle
1953  Bellman formulates as a
 Dynamic Programing problem
Source: http://www.lancs.ac.uk/~winterh/GRhist.html

                          Conductrics twitter: @mgershoff
Testing

 Explore First
   All actions have an equal chance of
    selection (uniform random).
   Use hypothesis testing to select a
    Winner.
 Then Exploit - Keep only Winner
  for selection

               Conductrics twitter: @mgershoff
Learn First
Data Collection/Sample                      Apply Leaning



    Explore/                                  Exploit/
     Learn                                     Earn

                        Time


                 Conductrics twitter: @mgershoff
P-Values: A Digression

P-Values:
 NOT the probability that the Null is
  True. P( Null=True| DATA)
 P(DATA (or more extreme)| Null=True)
 Not a great tool for deciding when to
  stop sampling
See:
http://andrewgelman.com/2010/09/noooooooooooooo_1/
http://www.stat.duke.edu/~berger/papers/02-01.pdf
                       Conductrics twitter: @mgershoff
A Couple Other Methods

1. Epsilon Greedy
  Nice and Simple

2. Upper Confidence Bounds(UCB)
  Adapts to Uncertainty


           Conductrics twitter: @mgershoff
1) Epsilon-Greedy




    Conductrics twitter: @mgershoff
Greedy

What do you mean by Greedy?

Make whatever choice seems
best at the moment.
Epsilon Greedy

What do you mean by Epsilon
Greedy?
 Explore  randomly select action
   percent of the time (say 20%)

 Exploit  Play greedy (pick the
  current best) 1-  (say 80%)
Epsilon Greedy
                          User



Explore/Learn                                          Exploit/Earn
(20%)                                                  (80%)


        Select                                     Select Current
      Randomly                                          Best
   Like AB Testing                                  (Be Greedy)

                     Conductrics twitter: @mgershoff
Epsilon Greedy
20% Random                         80% Select Best



                                    Action      Value
                                   A           $5.00
                                   B           $4.00
                                   C           $3.00
                                   D           $2.00
                                   E           $1.00
             Conductrics twitter: @mgershoff
Continuous Sampling

   Explore/Learn

    Exploit/Earn

             Time


      Conductrics twitter: @mgershoff
Epsilon Greedy

 Super Simple/low cost to implement
 Tends to be surprisingly effective
 Less affected by Seasonality

 Not optimal (hard to pick best )
 Doesnt use measure of variance
 Should/How to decrease Exploration over
  time?       Conductrics twitter: @mgershoff
Upper Confidence Bound

Basic Idea:
1) Calculate both mean and a measure
   of uncertainty (variance) for each
   action.
2) Make Greedy selections based on
   mean + uncertainty bonus


             Conductrics twitter: @mgershoff
Confidence Interval Review

Confidence Interval = mean +/- z*Std


     - 2*Std         Mean                        +2*Std




               Conductrics twitter: @mgershoff
Upper Confidence

Score each option using the upper
portion of the interval as a Bonus


  Mean     +Bonus




              Conductrics twitter: @mgershoff
Upper Confidence Bound
1) Use upper portion of CI as Bonus

2) Make Greedy Selections

             A                                     Select A
                          B
                      C

        $0       $5                    $10
             Estimated Reward

                 Conductrics twitter: @mgershoff
Upper Confidence Bound
1) Selecting Action A reduces uncertainty
bonus (because more data)

2) Action C now has highest score


              A
                              B
                          C                            Select C

         $0          $5                    $10
                  Estimated Reward

                     Conductrics twitter: @mgershoff
Upper Confidence Bound


 Like A/B Test  uses variance
  measure
 Unlike A/B Test  no hypothesis
  test
 Automatically Balances
  Exploration with Exploitation
             Conductrics twitter: @mgershoff
Case Study:
          Conversion
Treatment    Rate     Served
V2V3             9.9% 14,893
V2V2             9.7%  9,720
V2V1             8.0%  2,441
V1V3             3.3%  2,090
V2V3             2.6%  1,849
V2V2             2.0%  1,817
V1V1             1.8%  1,926
V3V1             1.8%  1,821
V1V2             1.5%  1,873
          Conductrics twitter: @mgershoff
Case Study

Test Method                  Conversion
                             Rate
Adaptive                     7%

Non Adaptive                 4.5%


           Conductrics twitter: @mgershoff
AB Testing V Bandit

Option A ->


Option B ->



Option C ->




              Conductrics twitter: @mgershoff
Why Should I Care?

 More Efficient Learning

 Automation

 Changing World


               Conductrics twitter: @mgershoff
Questions?




 Conductrics twitter: @mgershoff
Thank You!



Matt Gershoff
      p) 646-384-5151
      e) matt@conductrics.com
      t) @mgershoff

    Conductrics twitter: @mgershoff

More Related Content

Conductrics bandit basicsemetrics1016

  • 1. Bandit Basics A Different take on Online Optimization Conductrics twitter: @mgershoff
  • 2. Who is this guy? Matt Gershoff CEO: Conductrics Many Years in Database Marketing (New York and Paris) and a bit of Web Analytics www.conductrics.com twitter:@mgershoff Email:matt@conductrics.com
  • 4. What Are We Going to Hear? Optimization Basics Multi-Armed Bandit Its a Problem, Not a Method Some Methods AB Testing Epsilon Greedy Upper Confidence Interval (UCB) Some Results
  • 5. Choices Targeting Learning Optimization Conductrics twitter: @mgershoff
  • 6. OPTIMIZATION If THIS Then THAT ITTT brings together: 1.Decision Rules 2.Predictive Analytics 3.Choice Optimization Conductrics twitter: @mgershoff
  • 7. OPTIMIZATION Find and Apply the Rule with the most Value If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT Conductrics twitter: @mgershoff
  • 8. OPTIMIZATION Variables whose Values Variables whose Values Are Given to You You Control THIS THAT If Facebook High Spend Urban GEO . Then Offer A Offer B Offer C . . Predictive Model . . . . F1 . . F2 S Valuei . Home Page Fm Offer Y App Use Offer Z Inputs Conductrics twitter: @mgershoff Outputs
  • 9. But 1. We Dont Have Data on THAT 2. Need to Collect Sample THAT 3. How to Sample Efficiently? Offer A ? Offer B ? Offer C ? . . . . . Offer Y ? Offer Z ? Conductrics twitter: @mgershoff
  • 10. Where Marketing Applications: Websites Mobile Social Media Campaigns Banner Ads Pharma: Clinical Trials Conductrics twitter: @mgershoff
  • 11. What is a Multi Armed Bandit One Armed Bandit >Slot Machine The problem: How to pick between Slot Machines so that you walk out with most $$$ from Casino at the end of the Night? OR A B Conductrics twitter: @mgershoff
  • 12. Objective Pick so as to get the most return/profit as you can over time Technical term: Minimize Regret Conductrics twitter: @mgershoff
  • 13. Sequential Selection but how to Pick? OR A B Need to Sample, but do it efficiently Conductrics twitter: @mgershoff
  • 14. Explore Collect Data OR A B Data Collection is costly an Investment Be Efficient Balance the potential value of collecting new data with exploiting what you currently know. Conductrics twitter: @mgershoff
  • 15. Multi-Armed Bandits Bandit problems embody in essential form a conflict evident in all human action: choosing actions which yield immediate reward vs. choosing actions whose benefit will come only later.* - Peter Whittle *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010. Conductrics twitter: @mgershoff
  • 16. Exploration Exploitation 1) Explore/Learn Try out different actions to learn how they perform over time This is a data collection task. 2) Exploit/Earn Take advantage of what you have learned to get highest payoff Your current best guess Conductrics twitter: @mgershoff
  • 17. Not A New Problem 1933 first work on competing options 1940 WWII Problem Allies attempt to tackle 1953 Bellman formulates as a Dynamic Programing problem Source: http://www.lancs.ac.uk/~winterh/GRhist.html Conductrics twitter: @mgershoff
  • 18. Testing Explore First All actions have an equal chance of selection (uniform random). Use hypothesis testing to select a Winner. Then Exploit - Keep only Winner for selection Conductrics twitter: @mgershoff
  • 19. Learn First Data Collection/Sample Apply Leaning Explore/ Exploit/ Learn Earn Time Conductrics twitter: @mgershoff
  • 20. P-Values: A Digression P-Values: NOT the probability that the Null is True. P( Null=True| DATA) P(DATA (or more extreme)| Null=True) Not a great tool for deciding when to stop sampling See: http://andrewgelman.com/2010/09/noooooooooooooo_1/ http://www.stat.duke.edu/~berger/papers/02-01.pdf Conductrics twitter: @mgershoff
  • 21. A Couple Other Methods 1. Epsilon Greedy Nice and Simple 2. Upper Confidence Bounds(UCB) Adapts to Uncertainty Conductrics twitter: @mgershoff
  • 22. 1) Epsilon-Greedy Conductrics twitter: @mgershoff
  • 23. Greedy What do you mean by Greedy? Make whatever choice seems best at the moment.
  • 24. Epsilon Greedy What do you mean by Epsilon Greedy? Explore randomly select action percent of the time (say 20%) Exploit Play greedy (pick the current best) 1- (say 80%)
  • 25. Epsilon Greedy User Explore/Learn Exploit/Earn (20%) (80%) Select Select Current Randomly Best Like AB Testing (Be Greedy) Conductrics twitter: @mgershoff
  • 26. Epsilon Greedy 20% Random 80% Select Best Action Value A $5.00 B $4.00 C $3.00 D $2.00 E $1.00 Conductrics twitter: @mgershoff
  • 27. Continuous Sampling Explore/Learn Exploit/Earn Time Conductrics twitter: @mgershoff
  • 28. Epsilon Greedy Super Simple/low cost to implement Tends to be surprisingly effective Less affected by Seasonality Not optimal (hard to pick best ) Doesnt use measure of variance Should/How to decrease Exploration over time? Conductrics twitter: @mgershoff
  • 29. Upper Confidence Bound Basic Idea: 1) Calculate both mean and a measure of uncertainty (variance) for each action. 2) Make Greedy selections based on mean + uncertainty bonus Conductrics twitter: @mgershoff
  • 30. Confidence Interval Review Confidence Interval = mean +/- z*Std - 2*Std Mean +2*Std Conductrics twitter: @mgershoff
  • 31. Upper Confidence Score each option using the upper portion of the interval as a Bonus Mean +Bonus Conductrics twitter: @mgershoff
  • 32. Upper Confidence Bound 1) Use upper portion of CI as Bonus 2) Make Greedy Selections A Select A B C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  • 33. Upper Confidence Bound 1) Selecting Action A reduces uncertainty bonus (because more data) 2) Action C now has highest score A B C Select C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  • 34. Upper Confidence Bound Like A/B Test uses variance measure Unlike A/B Test no hypothesis test Automatically Balances Exploration with Exploitation Conductrics twitter: @mgershoff
  • 35. Case Study: Conversion Treatment Rate Served V2V3 9.9% 14,893 V2V2 9.7% 9,720 V2V1 8.0% 2,441 V1V3 3.3% 2,090 V2V3 2.6% 1,849 V2V2 2.0% 1,817 V1V1 1.8% 1,926 V3V1 1.8% 1,821 V1V2 1.5% 1,873 Conductrics twitter: @mgershoff
  • 36. Case Study Test Method Conversion Rate Adaptive 7% Non Adaptive 4.5% Conductrics twitter: @mgershoff
  • 37. AB Testing V Bandit Option A -> Option B -> Option C -> Conductrics twitter: @mgershoff
  • 38. Why Should I Care? More Efficient Learning Automation Changing World Conductrics twitter: @mgershoff
  • 40. Thank You! Matt Gershoff p) 646-384-5151 e) matt@conductrics.com t) @mgershoff Conductrics twitter: @mgershoff