�ݺ�ߣ

Julia Kiseleva
@julia_kiseleva
UserSat.com University of Amsterdam
Evaluating Personal Assistants

It brings us new challenges
Google at SIGIR 2016

From Queries to Dialogues
Q1: how is the weather in Chicago
Q2: how is it this weekend
Q3: find me hotels
Q4: which one of these is the cheapest
Q5: which one of these has at least 4 stars
Q6: find me directions from the Chicago airport to
number one
User’s dialogue
with Cortana:
Task is “Finding
a hotel in
Chicago”

From Queries to Dialogues
Q1: find me a pharmacy nearby
Q2: which of these is highly rated
Q3: show more information about number 2
Q4: how long will it take me to get there
Q5: Thanks
User’s dialogue
with Cortana:
Task is “Finding
a pharmacy”

Main Research Question
How can we automatically predict user
satisfaction with search dialogues on
intelligent assistants using
click, touch, and voice interactions?

Evaluation Personal Assistants

How to define user satisfaction
with search dialogues?

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
No Clicks
???

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
SAT? SAT? SAT?
Overall
SAT?
? SAT? SAT? SAT?

User Frustration
Q1: what's the weather like in San Francisco
Q2: what's the weather like in Mountain View
Q3: can you find me a hotel close to Mountain
View
Q4: can you show me the cheapest ones
Q5: show me the third one
Q6: show me the directions from SFO to this
hotel
Q6: show me the directions from SFO to this
hotel
Q7: go back to first hotel (misrecognition)
Q8: show me hotels in Mountain View
Q9: show me cheap hotels in Mountain View
Q10: show me more about the third one


Dialog with
Intelligent Assistant
Task is “Planning a
weekend ”
RestartsearchAuserissatisfied


What interaction signals can
track during search dialogues?

Tracking User Interaction:
Phonetic Similarity
Phonetic Similarity
between consecutive requests

3 seconds 6 seconds
33% of
ViewPort
66% of
ViewPort
ViewPortHeight
2 seconds
20% of
ViewPort
1s 4s 0.4s 5.4s+ + =
Tracking User Interaction

• Number of Swipes
• Number of up-swipes
• Number of down-swipes
• Total distance swiped (pixels)
• Number of swipes normalized by
time
• Total distance divided by num. of
swipes
• Total swiped distance divided by
time
• Number of swipe direction
changes
• SERP answer duration (seconds)
which is shown on screen (even
partially)
• Fraction of visible pixels belonging
to SERP answer
• Attributed time (seconds) to viewing
a particular element (answer) on
SERP
• Attributed time (seconds) per unit
height (pixels) associated with a
particular element on SERP
• Attributed time (milliseconds) per
unit area (square pixels) associated
with a particular element on SERP
Tracking User Interaction:
Touch Signals

Quality of Interaction Model
Method Accuracy (%) Average F1 (%)
Baseline 70.62 61.38
Interaction Model 80.81*
(14.43)
79.08*
(28.83)
* Statistically significant improvement (p < 0,05 )

How current prediction of user
satisfaction can be improved?

Normal vs Angry
Normal Voice
Angry Voice

Changes in User Emotions
ti
ti+1
Emotion
State
Emotion
State

Changes in User Emotions
ti
ti+1
Emotion
State
Emotion
State
SAT
DSAT

How to define a situational user
satisfaction?

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
From Queries to Dialogues:
Sequential Interaction

Tendency Toward Direct Answers

User-System Interaction Interface

User-System Interaction Interface
How to restore the user reward function?

Inverse Reinforcement Learning
[P. Abbeel’s slides on IRL]

• User satisfaction with personal assistants is defined in the generalized
form, which showed understanding the nature of user satisfaction as an
aggregation of satisfaction with all dialogue’s tasks and not as a
satisfaction with all dialogue’s queries separately
• We showed that features derived from voice and especially from touch
and voice interactions add significant gain in accuracy over the baseline
• We proposed a novel and dynamic approach to restore user reward
function
Thank you!
Questions?

�ݺ�ߣ

Evaluation Personal Assistants

More Related Content

Evaluation Personal Assistants

Editor's Notes