�ݺ�ߣ

Extracting Hierarchies of Search Tasks &
Subtasks
via a Bayesian Nonparametric Approach
Rishabh Mehrotra, Emine Yilmaz
9th August 2017, Tokyo, Japan

Introduction
Search is
omnipresent!

Introduction
Search is omnipresent…
Understanding users’
needs is hard!

Use Case: Search Engines
• Simple Tasks
• Complex Tasks

Need for search arises from real world task!

What is a Task?
• A search task is an atomic information need resulting in one or more
queries [Jones and Klinkner, CIKM '08]
• Complex search task: A set of related information needs, resulting in
one or more (possibly complex) tasks.
Credit
check
House buying
guide …
Houses
for sale
Loans for
house
17:00pm 17:02pm 17:06pm 18:15pm
Session 1 Session 2
Improve
credit
score
18:25pm

Extracting Search Tasks: Prior Work
Clustering session based queries [WSDM'11]

q
1
q
2
q
3
q
4
q
6
q
5
q
1
q
2
q
3
q
4
q
6
q
5
q
0
Latent!
Clustering session based queries [WSDM'11] Structured Learning Approach [WWW'13]

q
1
q
2
q
3
q
4
q
6
q
5
q
1
q
2
q
3
q
4
q
6
q
5
q
0
Latent!
Hawkes Process based Task Extraction [KDD'14]

q
1
q
2
q
3
q
4
q
6
q
5
q
1
q
2
q
3
q
4
q
6
q
5
q
0
Latent!
Hawkes Process based Task Extraction [KDD'14] dd-CRPs for extracting subtasks [NAACL’16]

Problems:
• Link query to on-going task = long chains
• impure tasks
• Rely on large corpus of pre-tagged queries
• Do not aggregate across users
• Tasks are not necessarily flat-structures
• complex tasks decompose into sub-tasks

Extracting Tasks & Subtasks
Goal:
Extract hierarchies of
search tasks & sub-tasks

Hierarchies of Tasks & Subtasks
• Search tasks tend to be hierarchical in nature

Constructing Task Hierarchies
• Most previous work represents tasks as flat structures
• One possibility: Hierarchical clustering methods
• No guide on the correct number of clusters
• Most construct binary tree representations of data
• Need models that can represent trees with arbitrary branches
• Complexity is a major problem

Hierarchical Task Extraction
Bayesian non-parametric approach
• Bayesian Rose Trees [UAI’10, NIPS’13]
• Represents a set of partitions of the data (recursively)

• Build upon Bayesian Rose Trees
• Each node of the tree corresponds to a task
• Each task represented by a set of queries

• Goal: Find the tree structure that maximizes
åÎ
=
)()(
))(|())(()|(
TPartT
TQpTpTQp
f
ff
Mixture over
partitions of
data points

• Goal: Find the tree structure that maximizes
• Number of partitions consistent with T can be exponentially large
• Approximate using dynamic programming:
åÎ
=
)()(
))(|())(()|(
TPartT
TQpTpTQp
f
ff
Likelihood of queries
belong to same task
)|)(()1()()|(
)(
ii
TchT
TTT TTleavespQfTQP
i
ÕÎ
-+= pp
Mixture over
partitions of
data points

Data Likelihood: Query to Query Affinity
r1: Query term based affinity
• Lexical similarity between
queries
r2: URL based affinity
• Similarity between the returned
URLs
r3: User/Session based affinity
• Query co-occurrence in the
same session
Õ å å
=
= Î Î
=
3
1 ||...1 ||...1
, ),|()(
k
k Qi Qj
kkqq
k
jirpQf ba

• Initially: The forest contains a single tree for each query

• At each step, pick a pair of trees in the forest to be merged
• Three types of merging operations

• Which trees & how to merge:
• Those which gives the highest Bayes Factor
improvement
•
)|()|(
)|(
JQpIQp
MQp
JI
M

• Which trees & how to merge:
• Those which gives the highest Bayes Factor
improvement
• Tree Pruning:
• node that represents a coherent task should not be split further
• Prune trees based on task coherence
)|()|(
)|(
JQpIQp
MQp
JI
M
)()(
),(
log),(
21
21
21
wpwp
wwp
wwPMI =

• Experiment 1: Search task identification
• Experiment 2: Crowd-sourced evaluation of hierarchy
• Experiment 3: Term prediction application
Baselines:
1. Bestlink-SVM
2. QC-WCC/QC-HTC
3. LDA-Hawkes
4. LDA-TW
5. Jones hierarchy
6. BHCD: Bayesian Hierarchical Community Detection
7. Bayesian agglomerative clustering
Experimental Evaluation
Task extraction baselines
Hierarchical model baselines

• Pairwise precision/recall:
• LDA-TW performs worst
• Too strong assumptions on queries belonging to
same task
• Gains over QC-HTC/WCC
• Query affinities can better reflect semantic
relationships
Experimental Evaluation – I
[Search Task Identification]
Flattened version of hierarchy is useful too!

• Evaluating task coherence:
• Task Relatedness: Randomly pick 2 queries from a task, and
get judgments for task relatedness
• Evaluating the hierarchy:
• Valid hierarchy:
• parent task ~ higher level task
• children tasks ~ more focused subtasks
• Useful hierarchy:
• Is the subtask useful in completing the
overall search task?
Experimental Evaluation – II
[Hierarchy Quality Evaluation]
Extracts tasks-subtasks which are Valid & Useful and have Related subtasks.

• Indirect evaluation based on term
prediction
1. Construct hierarchy
2. Map to correct node in the hierarchy
3. Leverage node queries for term prediction
• Assumption: identifying good tasks should
help in predicting future queries
• Intersection of TREC Session track & AOL
log data
Experimental Evaluation – III
[Term Prediction]
Outperforms flat-task extraction techniques as well as hierarchical baselines

• Hierarchies provide a more naturalistic view of complex tasks
• Bayesian non-parametric approach for hierarchy extraction
• Coherence based pruning helps identify atomic tasks
• Richer & more expressive models of tasks
• Valid, useful hierarchy with related subtasks
Take-Home Message

Thank You!
Rishabh Mehrotra
PhD candidate @ UCL
http://rishabhmehrotra.com
@erishabh
r.mehrotra@cs.ucl.ac.uk
Summary:
- Naturalistic view of tasks-subtasks
- Nonparametric approach
- Coherence pruning helps
- Richer & more expressive
Future Work:
- Evaluation techniques for hierarchies
- Mapping to correct level in hierarchy
- Subtask sequences & transitions

�ݺ�ߣ

SIGIR 2017: Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach

Recommended

More Related Content

Similar to SIGIR 2017: Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach (6)

More from Rishabh Mehrotra (9)

Recently uploaded (18)

SIGIR 2017: Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach