際際滷

際際滷Share a Scribd company logo
Learning to Reinforce
Search Effectiveness
Jiyun Luo, Xuchu Dong, Grace Hui Yang
Georgetown University
ICTIR 2015
1
Search by Test-the-Water
2
But you are not alone:
Search with a Partner
 A teamwork
 Share a common goal
 鍖nd relevant documents
 satisfy long term goal
 Equal partners
 not just being an assistant to
the user
 but also providing in鍖uence
 Cooperative exploration
3
Key Idea: Cooperative Exporation
 The two parties
 They talk and they listen
 keep exchanging their ideas
 Take turns to lead the search to the direction
each of them would like this collaboration to go
 Also considering the others opinion
4
t=1 Query q1=hydropower efficiency
Messages:Messages:
See my new query. Lets
explore!
5
Example is from TREC 2014 Session 52
t=1 Query q1=hydropower efficiency
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
See my new query. Lets
explore!
6
Example is from TREC 2014 Session 52
t=1
t=2
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
My new query is on
another subtopic. Lets
explore
7
Example is from TREC 2014 Session 52
t=1
t=2
Retrieved docs D2
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
Check it out! Documents
Ive ranked high are
relevant.
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
My new query is on
another subtopic. Lets
explore
8
Example is from TREC 2014 Session 52
t=1
t=2
t=3
Retrieved docs D2
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Clicked d2 in D2
Query q3=hydropower damage
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
Check it out! Documents
Ive ranked high are
relevant.
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
Documents Ive
clicked look relevant!
My new query is on
another subtopic. Lets
explore
My new query is still on
the same subtopic. Lets
find out more about it.
9
Example is from TREC 2014 Session 52
t=1
t=2
t=3

t=4
Retrieved docs D2
Retrieved docs D3
Query q1=hydropower efficiency
Clicked d2 in D1
Query q2=hydropower environment
Clicked d2 in D2
Query q3=hydropower damage
Retrieved docs D1 renewable energy
Messages:Messages:
Check it out! Documents
Ive ranked high are
relevant
Check it out! Documents
Ive ranked high are
relevant.
See my new query. Lets
explore!
Documents Ive clicked
look relevant!
Documents Ive
clicked look relevant!
Want to explore? Ive
diversified my results.
My new query is on
another subtopic. Lets
explore
My new query is still on
the same subtopic. Lets
find out more about it.
10
Example is from TREC 2014 Session 52
Opinions about Two Things
 Relevance
 Which documents (that
you have just marked/
retrieved/recommended)
are relevant
 Desire of Exploration
 How exploratory I want
us to be, as a team
11
How to Express the Opinions/
Feedback
 Relevance is demonstrated by examples:
 Query is a piece of short text sent from the user
 Clicked snippets/documents are long pieces of
text sent from user
 Documents are long text sent from the search
engine
 Desire of Exploration is shown by
 Query changes
 Diversi鍖ed results
12
A Contextual Bandit Formulation
of a Decision-Making Distribution
P(relevant) = 1  P(explore) = 袖
P(J = RE|o, a, ≠
) = (1 )袖
P(J = NRE|o, a, ≠
) = 晌
P(J |o, a, ≠
) = P(relevant)P(explore)
P(J = RNE|o, a, ≠
) = (1 )(1 袖)
P(J = NRNE|o, a, ≠
) = (1 袖)
13
Relevance Feedback from
the User
 1 SAT-Clicked out of 10 retrieved,
 = 1
# of SAT-Clicked documents 2 Dt 1
# of returned documents 2 Dt 1
14
" = 1
1
10
= 0.9
smoking quitting!q2! hypnosis !
Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help  !
!
Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction !
!
Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs!
!
Rank 10: !
SAT-Clicked. !
Dwell time: 40 seconds!
D1!
Exploration Feedback from
the User
 1 query change , 3 terms in the new query in
total
袖 = 1
# of query changes 2 Dt 1
# of permutations of query terms 2 Dt 1
15
smoking quitting!q2! hypnosis !
+q"
Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help  !
!
Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction !
!
Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs!
!
Rank 10: !
D1!
Query reformulation using words in
previous search results!
2 Dt 1
袖 = 1
1
3!
= 0.83
Relevance Feedback from
the Search Engine
 Highly scored documents
 Needs consistency in ranking scores
 Could be hard to get
 Highly ranked documents
 = 1
# of relevant documents 2 top n retrieved
n
16
Relevance Feedback from
the Search Engine
17
smoking quitting!q2!
D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking
Hypnosis Neuro!
!
Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart!
!
Rank 10: !
hypnosis !
 8 out of 10 top retrieved documents are relevant
 " = 1
8
10
= 0.2
Exploration Feedback from
the Search Engine
 More diversi鍖ed results show more mixed results
 Observe the word distribution
 Higher perplexity
袖 = 1
total # of the top m frequent non-stop-words 2 Dt
total # of non-stop-words 2 Dt
18
Exploration Feedback from
the Search Engine
19
smoking quitting!q2!
D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking
Hypnosis Neuro!
!
Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart!
!
Rank 10: !
hypnosis !
 428 non-stop-words in the top 10 snippets
 the most frequent 5 words:
smoke(59),quit(34),hypnosis(30),stop(19),button(7)
 袖 = 1
59 + 34 + 30 + 19 + 7
428
= 0.36
Put into a POSG Framework
 Partially Observable Stochastic Games (POSGs)
 multiple-agent version of POMDP
 A tuple <S,G,T,R> for States, Agents, Transitions,
Rewards
 G is a tuple too, for a set of agents , each is
<A,O,B>
 Actions, Observations, and Beliefs
20
Observation-Action Pairs
 indicates at time t that we can observe how the
user has browsed the previously retrieved search
results, clicked the documents, and reformulated
the query at the current search iteration.
 indicates that, at time t, the search engine
selects among its search algorithm options,
executes the search algorithms, and provides a
ranked list of search results.
21
(ot, at)
ot
at
Expectation Maximization
(EM) to Learn the Model
 Starts with a random policy
 At the Expectation step
 Compute the decision-making distribution
 Index the most likely decision by j
 A new policy is estimated by 鍖nding the best policy at step t
given the current estimates of model parameters and
 At the Maximization step
 Re-compute model parameters based on new estimate of the
policy
22
23
Experiments
 TREC 2012, 2013, 2014 Session Track data
 Immediate Search Effectiveness
 nDCG@10 at each search iteration
 TREC used nDCG@10 at the last search
interaction
24
Baselines
 Lemur: Lemur worked on the last query in a session
 Lemur+all: Lemur concatenating all the queries in a session
 QCM: query change model
 Win-win short: Win-Win uses short-term feedback, e.g. user
clicks, as rewards
 Win-win long: Win-Win uses long-term feedback, nDCG, as
rewards
 served as a performance upper bound
25
TREC 2012 Session
26
 鍖 performs the
best besides
winwin-long
 lemur+all, qcm,
winwin-long and
鍖 monotonically
increase over
iterations
 winwin-long > 鍖,
qcm, lemur+all >
winwin-short
>lemur > original
TREC 2013 Session
27
 Performance
boost at around
2nd iteration and
converge at the
5~6th iterations
 First a few queries
are more
representative
TREC 2014 Session
28
 鍖 achieves
signi鍖cant
nDCG@10
improvement over
qcm on TREC13
and TREC14
A new thinking
The search engine and the user are
equal partners.
29
Based on that, this paper
 Models the two-way communication between the
two partners on
 relevance
 desire to explore
 Proposes an EM algorithm for learning the best
policy in this framework
30
Look into the future
 Reinforcement-learning-style methods are good for
modeling information seeking
 A lot of room to study the user and the search engine
interaction in a generative way
 The thinking of equal partnership and two-way
communication could be able to generate a set of new
methods and algorithms
 on not only retrieval, but other related 鍖elds
 Exciting!!
31
Thank You!
 Email: huiyang@cs.georgetown.edu
 Group Page: Infosense at http://
infosense.cs.georgetown.edu/
 Dynamic IR Website: http://www.dynamic-ir-modeling.org/
 Live Online Search Engine: http://dumplingproject.org
 Upcoming Book: Dynamic Information Retrieval Modeling
 TREC 2015 Dynamic Domain Track: http://trec-dd.org/
32

More Related Content

Learning to Reinforce Search Effectiveness

  • 1. Learning to Reinforce Search Effectiveness Jiyun Luo, Xuchu Dong, Grace Hui Yang Georgetown University ICTIR 2015 1
  • 3. But you are not alone: Search with a Partner A teamwork Share a common goal 鍖nd relevant documents satisfy long term goal Equal partners not just being an assistant to the user but also providing in鍖uence Cooperative exploration 3
  • 4. Key Idea: Cooperative Exporation The two parties They talk and they listen keep exchanging their ideas Take turns to lead the search to the direction each of them would like this collaboration to go Also considering the others opinion 4
  • 5. t=1 Query q1=hydropower efficiency Messages:Messages: See my new query. Lets explore! 5 Example is from TREC 2014 Session 52
  • 6. t=1 Query q1=hydropower efficiency Retrieved docs D1 renewable energy Messages:Messages: Check it out! Documents Ive ranked high are relevant See my new query. Lets explore! 6 Example is from TREC 2014 Session 52
  • 7. t=1 t=2 Query q1=hydropower efficiency Clicked d2 in D1 Query q2=hydropower environment Retrieved docs D1 renewable energy Messages:Messages: Check it out! Documents Ive ranked high are relevant See my new query. Lets explore! Documents Ive clicked look relevant! My new query is on another subtopic. Lets explore 7 Example is from TREC 2014 Session 52
  • 8. t=1 t=2 Retrieved docs D2 Query q1=hydropower efficiency Clicked d2 in D1 Query q2=hydropower environment Retrieved docs D1 renewable energy Messages:Messages: Check it out! Documents Ive ranked high are relevant Check it out! Documents Ive ranked high are relevant. See my new query. Lets explore! Documents Ive clicked look relevant! My new query is on another subtopic. Lets explore 8 Example is from TREC 2014 Session 52
  • 9. t=1 t=2 t=3 Retrieved docs D2 Query q1=hydropower efficiency Clicked d2 in D1 Query q2=hydropower environment Clicked d2 in D2 Query q3=hydropower damage Retrieved docs D1 renewable energy Messages:Messages: Check it out! Documents Ive ranked high are relevant Check it out! Documents Ive ranked high are relevant. See my new query. Lets explore! Documents Ive clicked look relevant! Documents Ive clicked look relevant! My new query is on another subtopic. Lets explore My new query is still on the same subtopic. Lets find out more about it. 9 Example is from TREC 2014 Session 52
  • 10. t=1 t=2 t=3 t=4 Retrieved docs D2 Retrieved docs D3 Query q1=hydropower efficiency Clicked d2 in D1 Query q2=hydropower environment Clicked d2 in D2 Query q3=hydropower damage Retrieved docs D1 renewable energy Messages:Messages: Check it out! Documents Ive ranked high are relevant Check it out! Documents Ive ranked high are relevant. See my new query. Lets explore! Documents Ive clicked look relevant! Documents Ive clicked look relevant! Want to explore? Ive diversified my results. My new query is on another subtopic. Lets explore My new query is still on the same subtopic. Lets find out more about it. 10 Example is from TREC 2014 Session 52
  • 11. Opinions about Two Things Relevance Which documents (that you have just marked/ retrieved/recommended) are relevant Desire of Exploration How exploratory I want us to be, as a team 11
  • 12. How to Express the Opinions/ Feedback Relevance is demonstrated by examples: Query is a piece of short text sent from the user Clicked snippets/documents are long pieces of text sent from user Documents are long text sent from the search engine Desire of Exploration is shown by Query changes Diversi鍖ed results 12
  • 13. A Contextual Bandit Formulation of a Decision-Making Distribution P(relevant) = 1 P(explore) = 袖 P(J = RE|o, a, ≠ ) = (1 )袖 P(J = NRE|o, a, ≠ ) = 晌 P(J |o, a, ≠ ) = P(relevant)P(explore) P(J = RNE|o, a, ≠ ) = (1 )(1 袖) P(J = NRNE|o, a, ≠ ) = (1 袖) 13
  • 14. Relevance Feedback from the User 1 SAT-Clicked out of 10 retrieved, = 1 # of SAT-Clicked documents 2 Dt 1 # of returned documents 2 Dt 1 14 " = 1 1 10 = 0.9 smoking quitting!q2! hypnosis ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help ! ! Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction ! ! Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs! ! Rank 10: ! SAT-Clicked. ! Dwell time: 40 seconds! D1!
  • 15. Exploration Feedback from the User 1 query change , 3 terms in the new query in total 袖 = 1 # of query changes 2 Dt 1 # of permutations of query terms 2 Dt 1 15 smoking quitting!q2! hypnosis ! +q" Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help ! ! Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction ! ! Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs! ! Rank 10: ! D1! Query reformulation using words in previous search results! 2 Dt 1 袖 = 1 1 3! = 0.83
  • 16. Relevance Feedback from the Search Engine Highly scored documents Needs consistency in ranking scores Could be hard to get Highly ranked documents = 1 # of relevant documents 2 top n retrieved n 16
  • 17. Relevance Feedback from the Search Engine 17 smoking quitting!q2! D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro! ! Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart! ! Rank 10: ! hypnosis ! 8 out of 10 top retrieved documents are relevant " = 1 8 10 = 0.2
  • 18. Exploration Feedback from the Search Engine More diversi鍖ed results show more mixed results Observe the word distribution Higher perplexity 袖 = 1 total # of the top m frequent non-stop-words 2 Dt total # of non-stop-words 2 Dt 18
  • 19. Exploration Feedback from the Search Engine 19 smoking quitting!q2! D2! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro! ! Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart! ! Rank 10: ! hypnosis ! 428 non-stop-words in the top 10 snippets the most frequent 5 words: smoke(59),quit(34),hypnosis(30),stop(19),button(7) 袖 = 1 59 + 34 + 30 + 19 + 7 428 = 0.36
  • 20. Put into a POSG Framework Partially Observable Stochastic Games (POSGs) multiple-agent version of POMDP A tuple <S,G,T,R> for States, Agents, Transitions, Rewards G is a tuple too, for a set of agents , each is <A,O,B> Actions, Observations, and Beliefs 20
  • 21. Observation-Action Pairs indicates at time t that we can observe how the user has browsed the previously retrieved search results, clicked the documents, and reformulated the query at the current search iteration. indicates that, at time t, the search engine selects among its search algorithm options, executes the search algorithms, and provides a ranked list of search results. 21 (ot, at) ot at
  • 22. Expectation Maximization (EM) to Learn the Model Starts with a random policy At the Expectation step Compute the decision-making distribution Index the most likely decision by j A new policy is estimated by 鍖nding the best policy at step t given the current estimates of model parameters and At the Maximization step Re-compute model parameters based on new estimate of the policy 22
  • 23. 23
  • 24. Experiments TREC 2012, 2013, 2014 Session Track data Immediate Search Effectiveness nDCG@10 at each search iteration TREC used nDCG@10 at the last search interaction 24
  • 25. Baselines Lemur: Lemur worked on the last query in a session Lemur+all: Lemur concatenating all the queries in a session QCM: query change model Win-win short: Win-Win uses short-term feedback, e.g. user clicks, as rewards Win-win long: Win-Win uses long-term feedback, nDCG, as rewards served as a performance upper bound 25
  • 26. TREC 2012 Session 26 鍖 performs the best besides winwin-long lemur+all, qcm, winwin-long and 鍖 monotonically increase over iterations winwin-long > 鍖, qcm, lemur+all > winwin-short >lemur > original
  • 27. TREC 2013 Session 27 Performance boost at around 2nd iteration and converge at the 5~6th iterations First a few queries are more representative
  • 28. TREC 2014 Session 28 鍖 achieves signi鍖cant nDCG@10 improvement over qcm on TREC13 and TREC14
  • 29. A new thinking The search engine and the user are equal partners. 29
  • 30. Based on that, this paper Models the two-way communication between the two partners on relevance desire to explore Proposes an EM algorithm for learning the best policy in this framework 30
  • 31. Look into the future Reinforcement-learning-style methods are good for modeling information seeking A lot of room to study the user and the search engine interaction in a generative way The thinking of equal partnership and two-way communication could be able to generate a set of new methods and algorithms on not only retrieval, but other related 鍖elds Exciting!! 31
  • 32. Thank You! Email: huiyang@cs.georgetown.edu Group Page: Infosense at http:// infosense.cs.georgetown.edu/ Dynamic IR Website: http://www.dynamic-ir-modeling.org/ Live Online Search Engine: http://dumplingproject.org Upcoming Book: Dynamic Information Retrieval Modeling TREC 2015 Dynamic Domain Track: http://trec-dd.org/ 32