�ݺ�ߣ

Learning to Reinforce
Search Effectiveness
Jiyun Luo, Xuchu Dong, Grace Hui Yang
Georgetown University
ICTIR 2015
1

Search by ‘Test-the-Water’
2

But you are not alone:
Search with a Partner
• A teamwork
• Share a common goal
• ﬁnd relevant documents
• satisfy long term goal
• Equal partners
• not just being an assistant to
the user
• but also providing inﬂuence
• Cooperative exploration
3

Key Idea: Cooperative Exporation
• The two parties
• They talk and they listen
• keep exchanging their ideas
• Take turns to lead the search to the direction
each of them would like this collaboration to go
• Also considering the other’s opinion
4

t=1 Query q1=“hydropower efficiency”
Messages:Messages:
See my new query. Let’s
explore!
5
Example is from TREC 2014 Session 52

t=1 Query q1=“hydropower efficiency”
Retrieved docs D1 “…renewable energy…”
Messages:Messages:
Check it out! Documents
I’ve ranked high are
relevant
explore!
6

t=1
t=2
Query q1=“hydropower efficiency”
Clicked d2 in D1
Query q2=“hydropower environment”
Messages:Messages:
relevant
explore!
Documents I’ve clicked
look relevant!
My new query is on
another subtopic. Let’s
explore
7

t=1
t=2
Retrieved docs D2
Clicked d2 in D1
Messages:Messages:
relevant
relevant.
explore!
look relevant!
My new query is on
explore
8

t=1
t=2
t=3
Retrieved docs D2
Clicked d2 in D1
Clicked d2 in D2
Query q3=“hydropower damage”
Messages:Messages:
relevant
relevant.
explore!
look relevant!
Documents I’ve
clicked look relevant!
My new query is on
explore
My new query is still on
the same subtopic. Let’s
find out more about it.
9

t=1
t=2
t=3
…
t=4
Retrieved docs D2
Retrieved docs D3
Clicked d2 in D1
Clicked d2 in D2
Query q3=“hydropower damage”
Messages:Messages:
relevant
relevant.
explore!
look relevant!
Documents I’ve
clicked look relevant!
Want to explore? I’ve
diversified my results.
My new query is on
explore
My new query is still on
the same subtopic. Let’s
find out more about it.
10

Opinions about Two Things
• Relevance
• Which documents (that
you have just marked/
retrieved/recommended)
are relevant
• Desire of Exploration
• How exploratory I want
us to be, as a team
11

How to Express the Opinions/
Feedback
• Relevance is “demonstrated by examples”:
• Query is a piece of short text sent from the user
• Clicked snippets/documents are long pieces of
text sent from user
• Documents are long text sent from the search
engine
• Desire of Exploration is shown by
• Query changes
• Diversiﬁed results
12

A Contextual Bandit Formulation
of a Decision-Making Distribution
P(relevant) = 1 ✏ P(explore) = µ
P(J = RE|o, a, ⇡⇤
) = (1 ✏)µ
P(J = NRE|o, a, ⇡⇤
) = ✏µ
P(J |o, a, ⇡⇤
) = P(relevant)P(explore)
P(J = RNE|o, a, ⇡⇤
) = (1 ✏)(1 µ)
P(J = NRNE|o, a, ⇡⇤
) = ✏(1 µ)
13

Relevance Feedback from
the User
• 1 SAT-Clicked out of 10 retrieved,
✏ = 1
# of SAT-Clicked documents 2 Dt 1
# of returned documents 2 Dt 1
14
" = 1
1
10
= 0.9
smoking quitting!q2! hypnosis !
Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !
…!
Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction …!
…!
Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs…!
…!
Rank 10: …!
SAT-Clicked. !
Dwell time: 40 seconds!
D1!

Exploration Feedback from
the User
• 1 query change , 3 terms in the new query in
total
µ = 1
# of query changes 2 Dt 1
# of permutations of query terms 2 Dt 1
15
smoking quitting!q2! hypnosis !
+∆q"
Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !
…!
Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction …!
…!
Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs…!
…!
Rank 10: …!
D1!
Query reformulation using words in
previous search results!
2 Dt 1
µ = 1
1
3!
= 0.83

the Search Engine
• Highly scored documents
• Needs consistency in ranking scores
• Could be hard to get
• Highly ranked documents
✏ = 1
# of relevant documents 2 top n retrieved
n
16

the Search Engine
17
smoking quitting!q2!
D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking
Hypnosis Neuro…!
…!
Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart…!
…!
Rank 10: …!
hypnosis !
• 8 out of 10 top retrieved documents are relevant
• " = 1
8
10
= 0.2

the Search Engine
• More diversiﬁed results show more mixed results
• Observe the word distribution
• Higher perplexity
µ = 1
total # of the top m frequent non-stop-words 2 Dt
total # of non-stop-words 2 Dt
18

the Search Engine
19
smoking quitting!q2!
D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking
Hypnosis Neuro…!
…!
Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart…!
…!
Rank 10: …!
hypnosis !
• 428 non-stop-words in the top 10 snippets
• the most frequent 5 words:
“smoke”(59),“quit”(34),“hypnosis”(30),“stop”(19),“button”(7)
• µ = 1
59 + 34 + 30 + 19 + 7
428
= 0.36

Put into a POSG Framework
• Partially Observable Stochastic Games (POSGs)
• multiple-agent version of POMDP
• A tuple <S,G,T,R> for States, Agents, Transitions,
Rewards
• G is a tuple too, for a set of agents , each is
<A,O,B>
• Actions, Observations, and Beliefs
20

Observation-Action Pairs
• indicates at time t that we can observe how the
user has browsed the previously retrieved search
results, clicked the documents, and reformulated
the query at the current search iteration.
• indicates that, at time t, the search engine
selects among its search algorithm options,
executes the search algorithms, and provides a
ranked list of search results.
21
(ot, at)
ot
at

Expectation Maximization
(EM) to Learn the Model
• Starts with a random policy
• At the Expectation step
• Compute the decision-making distribution
• Index the most likely decision by j
• A new policy is estimated by ﬁnding the best policy at step t
given the current estimates of model parameters and
• At the Maximization step
• Re-compute model parameters based on new estimate of the
policy
22

Experiments
• TREC 2012, 2013, 2014 Session Track data
• Immediate Search Effectiveness
• nDCG@10 at each search iteration
• TREC used nDCG@10 at the last search
interaction
24

Baselines
• Lemur: Lemur worked on the last query in a session
• Lemur+all: Lemur concatenating all the queries in a session
• QCM: query change model
• Win-win short: Win-Win uses short-term feedback, e.g. user
clicks, as rewards
• Win-win long: Win-Win uses long-term feedback, nDCG, as
rewards
• served as a performance upper bound
25

TREC 2012 Session
26
• fl performs the
best besides
winwin-long
• lemur+all, qcm,
winwin-long and
fl monotonically
increase over
iterations
• winwin-long > fl,
qcm, lemur+all >
winwin-short
>lemur > original

TREC 2013 Session
27
• Performance
boost at around
2nd iteration and
converge at the
5~6th iterations
• First a few queries
are more
representative

TREC 2014 Session
28
• ﬂ achieves
signiﬁcant
nDCG@10
improvement over
qcm on TREC’13
and TREC’14

–A new thinking
“The search engine and the user are
equal partners.”
29

Based on that, this paper
• Models the two-way communication between the
two partners on
• relevance
• desire to explore
• Proposes an EM algorithm for learning the best
policy in this framework
30

Look into the future
• Reinforcement-learning-style methods are good for
modeling information seeking
• A lot of room to study the user and the search engine
interaction in a generative way
• The thinking of equal partnership and two-way
communication could be able to generate a set of new
methods and algorithms
• on not only retrieval, but other related ﬁelds
• Exciting!!
31

Thank You!
• Email: huiyang@cs.georgetown.edu
• Group Page: Infosense at http://
infosense.cs.georgetown.edu/
• Dynamic IR Website: http://www.dynamic-ir-modeling.org/
• Live Online Search Engine: http://dumplingproject.org
• Upcoming Book: Dynamic Information Retrieval Modeling
• TREC 2015 Dynamic Domain Track: http://trec-dd.org/
32

�ݺ�ߣ

Learning to Reinforce Search Effectiveness

More Related Content

Learning to Reinforce Search Effectiveness