�ݺ�ߣ

Sprez.za.tura
Roelof van Zwol
Netflix

Sprez.za.tura
“It is an art which does not seem to be an art. One
must avoid affectation and practice in all things. A
certain sprezzatura, disdain or carelessness, so as
to conceal art, and make whatever is done or said
appear to be without effort and almost without any
thought about it ... obvious effort is the antithesis
of grace.
Baldassare Castiglione (1478-1529)

When done well,
recommendations are
perceived a natural
extension of the
service
98% Match

Spot the
Algorithms!
98% Match

Introducing new content
● Who will watch the show?
● How many members will
watch the show?
● Which canvas to use?
● When to promote?

Overview
● Correlation ≠ Causation
● Online-learning
● Incrementality

Should you stop buying margarine,
to save your marriage?

Correlation (X,Y) is high, does it mean…
… X causes Y? … Y causes X?

Correlation (X,Y) is high, does it mean…
… X causes Y? … Y causes X?
In general, neither!
Most common reason: unobserved confounder
X Y
Unobserved
Observed Observed
C
“Omited variable bias”

Advertising
W1 W2 W3 W4 W5
Probability of
buying:
Advertise?$ $ $ $

Advertising
● High probability of conversion the day before weekly groceries irrespective
of adverts shown
● Effect of Pampers ads is null in this case.
Traditional (correlational) machine learning will fail
and waste $ on useless ads
W1 W2 W3 W4 W5
Probability of
buying:
Advertise?$ $ $ $
in practice, Cost-Per-Incremental-Acquisition can be > 100x Cost-Per-Acquisition (!!!!!)

Netflix Promotions
Netflix homepage is an expensive real-estate (opportunity cost):
- so many titles to promote
- so few opportunities to win a “moment of truth”
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶

Netflix Promotions
Netflix homepage is an expensive real-estate (opportunity cost):
- so many titles to promote
- so few opportunities to win a “moment of truth”
Traditional (correlational) ML systems:
- take action if probability of positive reward is high, irrespective of reward
base rate
- don’t model incremental effect of taking action
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶

CASE STUDY:
Content promotion
through Billboard
98% Match

Background and notation
● Title t belongs to the pool of candidate titles T, eligible for promotion in
Billboard when member m visits the homepage
● Let xm,t
be a context vector for member m and title t
● Let ym,t
be the label indicating a play of title t by member m from the
homepage, after having seen a billboard.

What (sequence of) actions will maximize the
cumulative reward?
● Reinforcement Learning
● Multi-Armed Bandits
● Acknowledge the need for balancing
exploration and exploitation
○ Allow sub-optimal actions, to collect unbiased treatment
effects and learn the probability distributions over the
space of possible actions.
B B7
7 7B
7 77
?
R3
R2
R1

ϵ-greedy policy
● Explore → Collect experimental data
○ With ϵ probability, select at random a title for promotion in Billboard
○ Log context (xm,t
)
○ Causal observations of play-feedback (ym,t
)
● Exploit → Train on the experimental data
○ With (1-ϵ) probability, select the optimal title for promotion
● Alternatives: UCB, Thompson Sampling

Greedy exploit model
● Learn a model per title to predict likelihood of play
P(ym,t
| xm,t
,T) = σ( f(xm,t
, Θ) )
● Pick winning title:
t = argmax P(ym,t
| xm,t
,T)
● Various models can be used to predict probability of
play, such as logistic regression, GBDT, neural networks

Considerations for ϵ-greedy policy
● Explore
○ Bandwidth allocation and cost of exploration
○ New vs existing titles
● Exploit
○ Model synchronisation
○ Title availability (group censoring)
○ Observation window
○ Frequency of model update
○ Incremental updates vs batch training
■ Stationarity of title popularities
?
?
?
? ??
?

Online learning works great for title
cold start scenarios, but...
MABs are
greedy, not
lift-based!

Incrementality-based policy
● Goal: Select title for promotion that benefits most from
being shown in billboard
○ Member can play title from other sections on the homepage or search
○ Popular titles likely to appear on homepage anyway: Trending Now
○ Better utilize most expensive real-estate on the homepage!
● Define policy to be incremental with respect to probability of play

Incrementality-based policy
● Goal: Select title for promotion that benefits most from
being shown in billboard
t = argmax [ P(ym,t
| xm,t
, T, b=1) - P(ym,t
| xm,t
, T, b=0) ]
Where b is an indicator for the treatment of a title being shown in billboard (b=1),
versus not being shown in billboard (b=0)

Offline evaluation: Replay [Li et al, 2010]
● Relies upon uniform exploration data.
● For every record in the uniform exploration log
{context, title k shown, reward, list of candidates}
● For every record:
○ Evaluate the trained model for all the titles in the candidate pool.
○ Pick the winning title k’
○ Keep the record in history if k’ = k (the title impressed in the logged
data) else discard it.
○ Compute the metrics from the history.

Offline evaluation: Replay [Li et al, 2010]
Uniform Exploration Data - Unbiased evaluation
Evaluation
Data
Train Data
Trained
Model
Reveal context x
Use reward only if k’ = k
Winner title k’
context,title,reward
Take Rate = # Plays
# Matches

Offline replay
Greedy exploit has higher replay
take rate than incrementality based
model….
Incrementality Based Policy
sacrifices replay by selecting a
lesser known title that would benefit
from being shown on the Billboard.
Lift in Replay in the various algorithms as
compared to the Random baseline

Which titles benefit from Billboard promotion?
Title A has a low baseline
probability of play, however when
the billboard is shown the
probability of play increases
substantially!
Title C has higher baseline
probability and may not benefit as
much from being shown on the
Billboard. Scatter plot of incremental vs baseline
probability of play for various members.

Online observations
● Online take rates for take rates follow the offline
patterns.
● Our implementation of incrementality is able to shift
engagement within the candidate pool.

Correlation, causation, and incrementality
Most ML algorithms are correlational, e.g. based on observational data
In this context, the Explore-exploit models are causal
E.g. we train models based on experimental data, where we are in control of
the randomization
Incrementality can be defined as the causal lift in a metric of interest
For instance, the change in probability of play for a title in a session, when a
billboard is shown for that title to a member

�ݺ�ߣ

Sprezzatura - Roelof van Zwol - May 2018

Recommended

More Related Content

Similar to Sprezzatura - Roelof van Zwol - May 2018 (20)

Recently uploaded (20)

Sprezzatura - Roelof van Zwol - May 2018