際際滷

際際滷Share a Scribd company logo
Copyright 息 2015 Criteo
Machine Learning for Performance Advertising
Grenoble Data Science
Copyright 息 2015 Criteo
Machine Learning for Performance Advertising
Eustache Diemert, Staff Research Scientist @ Criteo Research
Grenoble Data Science Meetup  Oct. 2017
Copyright 息 2015 Criteo
Part I : Introduction to Performance Advertising
Copyright 息 2015 Criteo
Performance Advertising ?
4
 Advertisers want sales
 short-term
 measurable impact
 Not interested in
 brand awareness
 marketing pressure
 segments (e.g. socio-demo)
Copyright 息 2015 Criteo
Programatic Advertising Scenario
56
Promo	!
Copyright 息 2015 Criteo
Programatic Advertising Scenario
User 123456789
Copyright 息 2015 Criteo
Programatic Advertising Scenario
User 123456789
For Sale
Copyright 息 2015 Criteo
Programatic Advertising Scenario
User 123456789
For sale
0,10
0,15
Advertiser 1
Advertiser 2
Copyright 息 2015 Criteo
Programatic Advertising Scenario
User 123456789
For Sale
0,10
0,15
~30 ms
Copyright 息 2015 Criteo
Programatic Advertising Scenario
Winning
Advertiser
Copyright 息 2015 Criteo
Programatic Advertising Scenario
56
Promo	!
Copyright 息 2015 Criteo
Performance Advertising Setup
12
Advertiser
Publisher
1. User visits a publisher webpage
2. Bidders recieve real-time auction
3. Winner displays ad for advertiser
4. User converts on advertiser website
(click / sale / lead)
CPA
CPM
Bidder
Copyright 息 2015 Criteo
Performance Advertising Metrics
13
 Ideally: number of sales 束 generated 損 by advertising for a given
budget
 But difficult/costly to measure and optimize
 E.g. incrementality A/B test
 (also sales amount, margin etc)
 Practically: number of sales attributed to advertising
 Commonly: last click attribution
 (also multi-touch, data driven etc)
Real-time bidding for performance
advertising
Key question : how much should we bid in the auction ?
Copyright 息 2015 Criteo
A little bit of game theory: 2nd price auctions
Sealed, 1 turn auction, winner pays the second highest bid
Value = 1
bid= 0,75
bid= 1,1
Value = 1
bid= 0,75
bid= 1,1
Competition:
0,5
Competition:
1,5
Case 1
Case 2
Value = 1
bid= 0,75
bid= 1,1
Competition:
1,05
Case 3
Value = 1
bid= 0,75
bid= 1,1
Competition:
0,8
Case 4
Copyright 息 2015 Criteo
Auction games
 Second-price auctions
 Dominant strategy: bid the expected gain (束 truthful auction 損)
 An overbid means you are losing money
 An underbid means you are losing potential revenue
 Also: non-second price
 Floors (hard/soft/dynamic)
Copyright 息 2015 Criteo
Baseline Bidding Policy
17
 Under 2nd price auction hypothesis, dominant strategy is to bid
expected value

= 駒		
束 Probability of post-click
attributed conversion 損
束 Value of a conversion 損
Copyright 息 2015 Criteo
Baseline Bidding Policy
18
 Under 2nd price auction hypothesis, dominant strategy is to bid
expected value

= 駒		
束 Probability of post-click
attributed conversion 損
束 Value of a conversion 損
Model quality/calibration impacts
revenue
Copyright 息 2015 Criteo
則 What can we use to predict clicks & sales?
則 User behavior on advertizers website
則 time since last visit
則 engagement level
則 last product seen, etc..
則 user fatigue: nb displays in last x days
Data features
則 Publisher:
則 publisher_id
則 url
則 display format
則 Campaign:
則 vertical_id: travel, classified, cars, etc
則 average ctr
Copyright 息 2015 Criteo
Learn on huge volumes of data
10 000 displays
Copyright 息 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
Copyright 息 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
leads to
1 sale
Copyright 息 2015 Criteo
Sizing of our prediction problems
則 Class unbalance: 0.5 / 100
則 N samples: 109
則 N raw variables: 102
則 N encoded features: 107
Copyright 息 2015 Criteo
Which algo to solve our problems?
Structured data
 Lots of info in the data
 High predictability
 Highly structured info
Unstructured data
 Poor predictability
 Signal dominated by noise
 Highly unstructured info
Copyright 息 2015 Criteo
則 Predict: P(Sales) = P(Click) P(Sales | Click)
則 P(Sales) ~ Bernoulli
則 Use (regularized) logistic regression
P(Y=1 | X) = 1/ (1+e-wTx)
則 Outputs a score in [0,1], interpreted as a probability
則 Negative log likelihood:
NLLH (y, p) =  y log p  (1  y) log (1  p)
 Convex Optimization, using (cheap) 1+st order methods (SGD, L-BFGS, SAG, )
Optimizing for sales
Copyright 息 2015 Criteo
則 Vanilla Logistic Regression uses binary features only
則 Standard representation of categorical features: one-hot encoding
For instance, site feature
則 Dimensionality equal to the number of different values -- can be very large
則 Hashing to reduce dimensionality (made popular by John Langford in VW)
Hashing trick
cnn.com news.yahoo.com
0 0 01 0 0 0
h : string ! [0 . . . 2b
1]
Copyright 息 2015 Criteo
則 Outer product between two features; similar to a polynomial kernel of degree 2
則 Large number of values hashing trick.
則 Example: between site and advertiser,
Feature is 1 site=finance.yahoo.com & advertiser=bank of america
Quadratic features
Publisher network
Publisher
Site
Url
Advertiser network
Ad
Campaign
Advertiser
,
Copyright 息 2015 Criteo
Part II : Attribution Model for Bidding Performance
Joint work with Julien Meynet, Pierre Galland, Damien Lefortier
published at AdKDD & TargetAd workshop (KDD 2017)
Copyright 息 2015 Criteo
Outline
 The problem: bidding in display advertising
 Model:
 Attribution model
 Attribution aware bidder
 Impact on offline evaluation metrics
 Experience & results
Copyright 息 2015 Criteo
束 Post-click attributed conversions 損?
30
Display ad
impression
Paid search
click
Display ad
click
Email
open $$$
 Last-click is the de facto attribution model
 but advertisers are moving towards better attribution models:
 Rule-based, uniform, linear, etc..
 Data driven: regression, shapley value, etc..
 But what is the impact from a bidders perspective?
 What is the optimal bidding strategy right after a click?
Attribution-aware
bidder
Copyright 息 2015 Criteo
Attribution Probability Through Time Matters
32
Attributionprobability
givenconversion
Copyright 息 2015 Criteo
Attribution Model
33
 How can we model probability of getting the attribution given there
will be a conversion?
 :	Post click conversion
 : Attributed conversion
 : Contextual features
 : Delay click/conversion
  = 1  = 1,  = ,  = 隆) =	 9: ; <
,
   0Tapez	une	辿quation	ici.
Copyright 息 2015 Criteo
Conversion Modeling
34
 Baseline solution:
 0/1 prediction problem  Logistic Regression
 Large scale / latency constraint  Hashing trick

= 駒		 束 Probability of post-click
attributed conversion 損
But what are positives / negatives?
Copyright 息 2015 Criteo
From Attribution Model to an Attribution Aware Bidder
35
PQ								0													0																											1	
RS 				1 3 							1 3 																						1 3
VQ								1													0																											0	
WPP							1													1																											1	
WX							0.6									0.1																							0.3	
Cast the problem
as an internal
attribution
problem
Copyright 息 2015 Criteo
Attribution Aware Bidder: An Intuitive View
36
AB: previous click gives us the
attribution, only bid 束 marginal value 損
LCB: user is engaged, go for last-clickbidvalue
t
New display opportunity
Copyright 息 2015 Criteo
Attribution Aware Bidder
37
 Baseline Last-click Bidder (LCB)
 Attribution-aware Bidder (AB):
[:	time	elapsed since last	click
 = 駒		 PQ = 1 	 = )	Tapez	une	辿quation	ici.
 = 駒		 WPP = 1 	 = )	 1	 	 9: ; <b , Tapez	une	辿quatio
Bid proportionally to the marginal contribution of the display
Impact on the
offline evaluation
metrics
Copyright 息 2015 Criteo
Offline Evaluation of Bidders
39
 Utility metric on logged
feedbacks:
 Expected Utility: add uncertainty
on the cost distribution:
	~	  = 署h + 1, 
 k
= l(h h  h)(k
h
h > 	h)
hs
Tapez	une	辿quation	ici.
k
h
h
h
Copyright 息 2015 Criteo
Attribution Aware Expected Utility*
40
 Inject attribution function in the Utility:
基 k
,  =	l((h)h  h)(k
h
h > 	h)
hs
Tapez	une	辿quation	ici.
Internal attribution function:
 can be last-click, first click, etc..
 can be the proposed attribution
model
* Evaluation of the proposed metric would require a
proper offline / online correlation analysis
Experiments &
Results
Copyright 息 2015 Criteo
Offline Evaluation - Dataset
42
Log sampled from 30 days of Criteo traffic
 Anonymized
 Each line is an impression with:
 Timestamp
 Price paid
 Contextual features (user, advertiser, publisher)
 Click*, click position*, click number*
 Conversion*, conversion value*
 Attribution label (conversion was attributed to Criteo)
 16M displays, 5M clicks, 800k conversions
Will be available at http://research.criteo.com/ soon
Copyright 息 2015 Criteo
Attribution Rates vs Time
43
Decay of attribution rate after a click
> 40% of conversions have
more than one click in the
preceding 30d
Copyright 息 2015 Criteo
Offline Evaluation  Impact on Bid Profiles
44
Post-click bid profiles for 3 bidders:
 Last-Click Bidder (LCB)
 First-Click Bidder (FCB)
 Attribution Bidder (基)
All models are learn using
regularized logistic regression
+ hashing trick
Copyright 息 2015 Criteo
Offline Evaluation  Bidders Comparision
45
Results for 3 bidders on the Attribution Aware Expected Utility
瑞駒 告駒 基
Win Rate 0.94 0.90 0.89
W

,  = 1000 2852 賊 43 2888 賊 43  賊 
 We limit user over exposure after a click
 We get closer to lift-based bidding
 We can reinvest budget on more profitable campaigns / more
incremental ads
Copyright 息 2015 Criteo
Online result
46
We tested online a simple modification of baseline through A/B
testing:
乞
(long term)
Revenue
(short term)
Advertiser
ROI
User ad
exposure
+. %
world wide
negative positive lower
暗 = 高暗					 1	  巨9:<b Tapez	une	辿quation	ici.
Future Research
Directions
Copyright 息 2015 Criteo
Work in progress & Next steps
 Better attribution modeling
 Exponential decay is naive: build a better model (e.g travel
partners have different attribution schemes)
 Model both conversion lift and attribution lift
 Delayed feedback in both cases
 Derive a robust (counterfactual) offline metric
Questions?
Copyright 息 2015 Criteo
Questions?
References
Simple and Scalable Response Prediction for
Display Advertising, O. Chapelle, E. Manavoglu,
and R. Rosales, ACM TIST, 2013.
Offline Evaluation of Response Prediction in
Online Advertising Auctions, O. Chapelle,
WWW15
Attribution Modeling Increases Efficiency of
Bidding in Display Advertising, E. Diemert, J.
Meynet, P. Galland, D; Lefortier KDD17 TargetAd
workshop best paper finalist
http://labs.criteo.com
 Articles on dev & science at Criteo
http://research.criteo.com
 Conference reports & cutting edge science ;)
e.diemert@criteo.com

More Related Content

Machine Learning for Performance Advertising

  • 1. Copyright 息 2015 Criteo Machine Learning for Performance Advertising Grenoble Data Science
  • 2. Copyright 息 2015 Criteo Machine Learning for Performance Advertising Eustache Diemert, Staff Research Scientist @ Criteo Research Grenoble Data Science Meetup Oct. 2017
  • 3. Copyright 息 2015 Criteo Part I : Introduction to Performance Advertising
  • 4. Copyright 息 2015 Criteo Performance Advertising ? 4 Advertisers want sales short-term measurable impact Not interested in brand awareness marketing pressure segments (e.g. socio-demo)
  • 5. Copyright 息 2015 Criteo Programatic Advertising Scenario 56 Promo !
  • 6. Copyright 息 2015 Criteo Programatic Advertising Scenario User 123456789
  • 7. Copyright 息 2015 Criteo Programatic Advertising Scenario User 123456789 For Sale
  • 8. Copyright 息 2015 Criteo Programatic Advertising Scenario User 123456789 For sale 0,10 0,15 Advertiser 1 Advertiser 2
  • 9. Copyright 息 2015 Criteo Programatic Advertising Scenario User 123456789 For Sale 0,10 0,15 ~30 ms
  • 10. Copyright 息 2015 Criteo Programatic Advertising Scenario Winning Advertiser
  • 11. Copyright 息 2015 Criteo Programatic Advertising Scenario 56 Promo !
  • 12. Copyright 息 2015 Criteo Performance Advertising Setup 12 Advertiser Publisher 1. User visits a publisher webpage 2. Bidders recieve real-time auction 3. Winner displays ad for advertiser 4. User converts on advertiser website (click / sale / lead) CPA CPM Bidder
  • 13. Copyright 息 2015 Criteo Performance Advertising Metrics 13 Ideally: number of sales 束 generated 損 by advertising for a given budget But difficult/costly to measure and optimize E.g. incrementality A/B test (also sales amount, margin etc) Practically: number of sales attributed to advertising Commonly: last click attribution (also multi-touch, data driven etc)
  • 14. Real-time bidding for performance advertising Key question : how much should we bid in the auction ?
  • 15. Copyright 息 2015 Criteo A little bit of game theory: 2nd price auctions Sealed, 1 turn auction, winner pays the second highest bid Value = 1 bid= 0,75 bid= 1,1 Value = 1 bid= 0,75 bid= 1,1 Competition: 0,5 Competition: 1,5 Case 1 Case 2 Value = 1 bid= 0,75 bid= 1,1 Competition: 1,05 Case 3 Value = 1 bid= 0,75 bid= 1,1 Competition: 0,8 Case 4
  • 16. Copyright 息 2015 Criteo Auction games Second-price auctions Dominant strategy: bid the expected gain (束 truthful auction 損) An overbid means you are losing money An underbid means you are losing potential revenue Also: non-second price Floors (hard/soft/dynamic)
  • 17. Copyright 息 2015 Criteo Baseline Bidding Policy 17 Under 2nd price auction hypothesis, dominant strategy is to bid expected value = 駒 束 Probability of post-click attributed conversion 損 束 Value of a conversion 損
  • 18. Copyright 息 2015 Criteo Baseline Bidding Policy 18 Under 2nd price auction hypothesis, dominant strategy is to bid expected value = 駒 束 Probability of post-click attributed conversion 損 束 Value of a conversion 損 Model quality/calibration impacts revenue
  • 19. Copyright 息 2015 Criteo 則 What can we use to predict clicks & sales? 則 User behavior on advertizers website 則 time since last visit 則 engagement level 則 last product seen, etc.. 則 user fatigue: nb displays in last x days Data features 則 Publisher: 則 publisher_id 則 url 則 display format 則 Campaign: 則 vertical_id: travel, classified, cars, etc 則 average ctr
  • 20. Copyright 息 2015 Criteo Learn on huge volumes of data 10 000 displays
  • 21. Copyright 息 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks
  • 22. Copyright 息 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks leads to 1 sale
  • 23. Copyright 息 2015 Criteo Sizing of our prediction problems 則 Class unbalance: 0.5 / 100 則 N samples: 109 則 N raw variables: 102 則 N encoded features: 107
  • 24. Copyright 息 2015 Criteo Which algo to solve our problems? Structured data Lots of info in the data High predictability Highly structured info Unstructured data Poor predictability Signal dominated by noise Highly unstructured info
  • 25. Copyright 息 2015 Criteo 則 Predict: P(Sales) = P(Click) P(Sales | Click) 則 P(Sales) ~ Bernoulli 則 Use (regularized) logistic regression P(Y=1 | X) = 1/ (1+e-wTx) 則 Outputs a score in [0,1], interpreted as a probability 則 Negative log likelihood: NLLH (y, p) = y log p (1 y) log (1 p) Convex Optimization, using (cheap) 1+st order methods (SGD, L-BFGS, SAG, ) Optimizing for sales
  • 26. Copyright 息 2015 Criteo 則 Vanilla Logistic Regression uses binary features only 則 Standard representation of categorical features: one-hot encoding For instance, site feature 則 Dimensionality equal to the number of different values -- can be very large 則 Hashing to reduce dimensionality (made popular by John Langford in VW) Hashing trick cnn.com news.yahoo.com 0 0 01 0 0 0 h : string ! [0 . . . 2b 1]
  • 27. Copyright 息 2015 Criteo 則 Outer product between two features; similar to a polynomial kernel of degree 2 則 Large number of values hashing trick. 則 Example: between site and advertiser, Feature is 1 site=finance.yahoo.com & advertiser=bank of america Quadratic features Publisher network Publisher Site Url Advertiser network Ad Campaign Advertiser ,
  • 28. Copyright 息 2015 Criteo Part II : Attribution Model for Bidding Performance Joint work with Julien Meynet, Pierre Galland, Damien Lefortier published at AdKDD & TargetAd workshop (KDD 2017)
  • 29. Copyright 息 2015 Criteo Outline The problem: bidding in display advertising Model: Attribution model Attribution aware bidder Impact on offline evaluation metrics Experience & results
  • 30. Copyright 息 2015 Criteo 束 Post-click attributed conversions 損? 30 Display ad impression Paid search click Display ad click Email open $$$ Last-click is the de facto attribution model but advertisers are moving towards better attribution models: Rule-based, uniform, linear, etc.. Data driven: regression, shapley value, etc.. But what is the impact from a bidders perspective? What is the optimal bidding strategy right after a click?
  • 32. Copyright 息 2015 Criteo Attribution Probability Through Time Matters 32 Attributionprobability givenconversion
  • 33. Copyright 息 2015 Criteo Attribution Model 33 How can we model probability of getting the attribution given there will be a conversion? : Post click conversion : Attributed conversion : Contextual features : Delay click/conversion = 1 = 1, = , = 隆) = 9: ; < , 0Tapez une 辿quation ici.
  • 34. Copyright 息 2015 Criteo Conversion Modeling 34 Baseline solution: 0/1 prediction problem Logistic Regression Large scale / latency constraint Hashing trick = 駒 束 Probability of post-click attributed conversion 損 But what are positives / negatives?
  • 35. Copyright 息 2015 Criteo From Attribution Model to an Attribution Aware Bidder 35 PQ 0 0 1 RS 1 3 1 3 1 3 VQ 1 0 0 WPP 1 1 1 WX 0.6 0.1 0.3 Cast the problem as an internal attribution problem
  • 36. Copyright 息 2015 Criteo Attribution Aware Bidder: An Intuitive View 36 AB: previous click gives us the attribution, only bid 束 marginal value 損 LCB: user is engaged, go for last-clickbidvalue t New display opportunity
  • 37. Copyright 息 2015 Criteo Attribution Aware Bidder 37 Baseline Last-click Bidder (LCB) Attribution-aware Bidder (AB): [: time elapsed since last click = 駒 PQ = 1 = ) Tapez une 辿quation ici. = 駒 WPP = 1 = ) 1 9: ; <b , Tapez une 辿quatio Bid proportionally to the marginal contribution of the display
  • 38. Impact on the offline evaluation metrics
  • 39. Copyright 息 2015 Criteo Offline Evaluation of Bidders 39 Utility metric on logged feedbacks: Expected Utility: add uncertainty on the cost distribution: ~ = 署h + 1, k = l(h h h)(k h h > h) hs Tapez une 辿quation ici. k h h h
  • 40. Copyright 息 2015 Criteo Attribution Aware Expected Utility* 40 Inject attribution function in the Utility: 基 k , = l((h)h h)(k h h > h) hs Tapez une 辿quation ici. Internal attribution function: can be last-click, first click, etc.. can be the proposed attribution model * Evaluation of the proposed metric would require a proper offline / online correlation analysis
  • 42. Copyright 息 2015 Criteo Offline Evaluation - Dataset 42 Log sampled from 30 days of Criteo traffic Anonymized Each line is an impression with: Timestamp Price paid Contextual features (user, advertiser, publisher) Click*, click position*, click number* Conversion*, conversion value* Attribution label (conversion was attributed to Criteo) 16M displays, 5M clicks, 800k conversions Will be available at http://research.criteo.com/ soon
  • 43. Copyright 息 2015 Criteo Attribution Rates vs Time 43 Decay of attribution rate after a click > 40% of conversions have more than one click in the preceding 30d
  • 44. Copyright 息 2015 Criteo Offline Evaluation Impact on Bid Profiles 44 Post-click bid profiles for 3 bidders: Last-Click Bidder (LCB) First-Click Bidder (FCB) Attribution Bidder (基) All models are learn using regularized logistic regression + hashing trick
  • 45. Copyright 息 2015 Criteo Offline Evaluation Bidders Comparision 45 Results for 3 bidders on the Attribution Aware Expected Utility 瑞駒 告駒 基 Win Rate 0.94 0.90 0.89 W , = 1000 2852 賊 43 2888 賊 43 賊 We limit user over exposure after a click We get closer to lift-based bidding We can reinvest budget on more profitable campaigns / more incremental ads
  • 46. Copyright 息 2015 Criteo Online result 46 We tested online a simple modification of baseline through A/B testing: 乞 (long term) Revenue (short term) Advertiser ROI User ad exposure +. % world wide negative positive lower 暗 = 高暗 1 巨9:<b Tapez une 辿quation ici.
  • 48. Copyright 息 2015 Criteo Work in progress & Next steps Better attribution modeling Exponential decay is naive: build a better model (e.g travel partners have different attribution schemes) Model both conversion lift and attribution lift Delayed feedback in both cases Derive a robust (counterfactual) offline metric
  • 50. Copyright 息 2015 Criteo Questions? References Simple and Scalable Response Prediction for Display Advertising, O. Chapelle, E. Manavoglu, and R. Rosales, ACM TIST, 2013. Offline Evaluation of Response Prediction in Online Advertising Auctions, O. Chapelle, WWW15 Attribution Modeling Increases Efficiency of Bidding in Display Advertising, E. Diemert, J. Meynet, P. Galland, D; Lefortier KDD17 TargetAd workshop best paper finalist http://labs.criteo.com Articles on dev & science at Criteo http://research.criteo.com Conference reports & cutting edge science ;) e.diemert@criteo.com