We use contextual bandit and EM for modeling the two communication between the user and the search engine. The 4th algorithm that we've created for dynamic search
1 of 32
Download to read offline
More Related Content
Learning to Reinforce Search Effectiveness
1. Learning to Reinforce
Search Effectiveness
Jiyun Luo, Xuchu Dong, Grace Hui Yang
Georgetown University
ICTIR 2015
1
3. But you are not alone:
Search with a Partner
A teamwork
Share a common goal
鍖nd relevant documents
satisfy long term goal
Equal partners
not just being an assistant to
the user
but also providing in鍖uence
Cooperative exploration
3
4. Key Idea: Cooperative Exporation
The two parties
They talk and they listen
keep exchanging their ideas
Take turns to lead the search to the direction
each of them would like this collaboration to go
Also considering the others opinion
4
5. t=1 Query q1=hydropower efficiency
Messages:Messages:
See my new query. Lets
explore!
5
Example is from TREC 2014 Session 52
6. t=1 Query q1=hydropower efficiency
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
See my new query. Lets
explore!
6
Example is from TREC 2014 Session 52
7. t=1
t=2
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
My new query is on
another subtopic. Lets
explore
7
Example is from TREC 2014 Session 52
8. t=1
t=2
Retrieved docs D2
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
Check it out! Documents
Ive ranked high are
relevant.
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
My new query is on
another subtopic. Lets
explore
8
Example is from TREC 2014 Session 52
9. t=1
t=2
t=3
Retrieved docs D2
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Clicked d2 in D2
Query q3=hydropower damage
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
Check it out! Documents
Ive ranked high are
relevant.
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
Documents Ive
clicked look relevant!
My new query is on
another subtopic. Lets
explore
My new query is still on
the same subtopic. Lets
find out more about it.
9
Example is from TREC 2014 Session 52
10. t=1
t=2
t=3
t=4
Retrieved docs D2
Retrieved docs D3
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Clicked d2 in D2
Query q3=hydropower damage
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
Check it out! Documents
Ive ranked high are
relevant.
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
Documents Ive
clicked look relevant!
Want to explore? Ive
diversified my results.
My new query is on
another subtopic. Lets
explore
My new query is still on
the same subtopic. Lets
find out more about it.
10
Example is from TREC 2014 Session 52
11. Opinions about Two Things
Relevance
Which documents (that
you have just marked/
retrieved/recommended)
are relevant
Desire of Exploration
How exploratory I want
us to be, as a team
11
12. How to Express the Opinions/
Feedback
Relevance is demonstrated by examples:
Query is a piece of short text sent from the user
Clicked snippets/documents are long pieces of
text sent from user
Documents are long text sent from the search
engine
Desire of Exploration is shown by
Query changes
Diversi鍖ed results
12
13. A Contextual Bandit Formulation
of a Decision-Making Distribution
P(relevant) = 1 P(explore) = 袖
P(J = RE|o, a, ≠
) = (1 )袖
P(J = NRE|o, a, ≠
) = 晌
P(J |o, a, ≠
) = P(relevant)P(explore)
P(J = RNE|o, a, ≠
) = (1 )(1 袖)
P(J = NRNE|o, a, ≠
) = (1 袖)
13
14. Relevance Feedback from
the User
1 SAT-Clicked out of 10 retrieved,
= 1
# of SAT-Clicked documents 2 Dt 1
# of returned documents 2 Dt 1
14
" = 1
1
10
= 0.9
smoking quitting!q2! hypnosis !
Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help !
!
Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction !
!
Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs!
!
Rank 10: !
SAT-Clicked. !
Dwell time: 40 seconds!
D1!
15. Exploration Feedback from
the User
1 query change , 3 terms in the new query in
total
袖 = 1
# of query changes 2 Dt 1
# of permutations of query terms 2 Dt 1
15
smoking quitting!q2! hypnosis !
+q"
Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help !
!
Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction !
!
Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs!
!
Rank 10: !
D1!
Query reformulation using words in
previous search results!
2 Dt 1
袖 = 1
1
3!
= 0.83
16. Relevance Feedback from
the Search Engine
Highly scored documents
Needs consistency in ranking scores
Could be hard to get
Highly ranked documents
= 1
# of relevant documents 2 top n retrieved
n
16
17. Relevance Feedback from
the Search Engine
17
smoking quitting!q2!
D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking
Hypnosis Neuro!
!
Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart!
!
Rank 10: !
hypnosis !
8 out of 10 top retrieved documents are relevant
" = 1
8
10
= 0.2
18. Exploration Feedback from
the Search Engine
More diversi鍖ed results show more mixed results
Observe the word distribution
Higher perplexity
袖 = 1
total # of the top m frequent non-stop-words 2 Dt
total # of non-stop-words 2 Dt
18
19. Exploration Feedback from
the Search Engine
19
smoking quitting!q2!
D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking
Hypnosis Neuro!
!
Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart!
!
Rank 10: !
hypnosis !
428 non-stop-words in the top 10 snippets
the most frequent 5 words:
smoke(59),quit(34),hypnosis(30),stop(19),button(7)
袖 = 1
59 + 34 + 30 + 19 + 7
428
= 0.36
20. Put into a POSG Framework
Partially Observable Stochastic Games (POSGs)
multiple-agent version of POMDP
A tuple <S,G,T,R> for States, Agents, Transitions,
Rewards
G is a tuple too, for a set of agents , each is
<A,O,B>
Actions, Observations, and Beliefs
20
21. Observation-Action Pairs
indicates at time t that we can observe how the
user has browsed the previously retrieved search
results, clicked the documents, and reformulated
the query at the current search iteration.
indicates that, at time t, the search engine
selects among its search algorithm options,
executes the search algorithms, and provides a
ranked list of search results.
21
(ot, at)
ot
at
22. Expectation Maximization
(EM) to Learn the Model
Starts with a random policy
At the Expectation step
Compute the decision-making distribution
Index the most likely decision by j
A new policy is estimated by 鍖nding the best policy at step t
given the current estimates of model parameters and
At the Maximization step
Re-compute model parameters based on new estimate of the
policy
22
24. Experiments
TREC 2012, 2013, 2014 Session Track data
Immediate Search Effectiveness
nDCG@10 at each search iteration
TREC used nDCG@10 at the last search
interaction
24
25. Baselines
Lemur: Lemur worked on the last query in a session
Lemur+all: Lemur concatenating all the queries in a session
QCM: query change model
Win-win short: Win-Win uses short-term feedback, e.g. user
clicks, as rewards
Win-win long: Win-Win uses long-term feedback, nDCG, as
rewards
served as a performance upper bound
25
26. TREC 2012 Session
26
鍖 performs the
best besides
winwin-long
lemur+all, qcm,
winwin-long and
鍖 monotonically
increase over
iterations
winwin-long > 鍖,
qcm, lemur+all >
winwin-short
>lemur > original
27. TREC 2013 Session
27
Performance
boost at around
2nd iteration and
converge at the
5~6th iterations
First a few queries
are more
representative
28. TREC 2014 Session
28
鍖 achieves
signi鍖cant
nDCG@10
improvement over
qcm on TREC13
and TREC14
30. Based on that, this paper
Models the two-way communication between the
two partners on
relevance
desire to explore
Proposes an EM algorithm for learning the best
policy in this framework
30
31. Look into the future
Reinforcement-learning-style methods are good for
modeling information seeking
A lot of room to study the user and the search engine
interaction in a generative way
The thinking of equal partnership and two-way
communication could be able to generate a set of new
methods and algorithms
on not only retrieval, but other related 鍖elds
Exciting!!
31
32. Thank You!
Email: huiyang@cs.georgetown.edu
Group Page: Infosense at http://
infosense.cs.georgetown.edu/
Dynamic IR Website: http://www.dynamic-ir-modeling.org/
Live Online Search Engine: http://dumplingproject.org
Upcoming Book: Dynamic Information Retrieval Modeling
TREC 2015 Dynamic Domain Track: http://trec-dd.org/
32