際際滷

際際滷Share a Scribd company logo
Julia Kiseleva
@julia_kiseleva
UserSat.com University of Amsterdam
Evaluating Personal Assistants
Google at SIGIR 2016
Google at SIGIR 2016
2016
Google at SIGIR 2016
It brings us new challenges
Google at SIGIR 2016
Google at SIGIR 2016
From Queries to Dialogues
Q1: how is the weather in Chicago
Q2: how is it this weekend
Q3: find me hotels
Q4: which one of these is the cheapest
Q5: which one of these has at least 4 stars
Q6: find me directions from the Chicago airport to
number one
Users dialogue
with Cortana:
Task is Finding
a hotel in
Chicago
From Queries to Dialogues
Q1: find me a pharmacy nearby
Q2: which of these is highly rated
Q3: show more information about number 2
Q4: how long will it take me to get there
Q5: Thanks
Users dialogue
with Cortana:
Task is Finding
a pharmacy
Main Research Question
How can we automatically predict user
satisfaction with search dialogues on
intelligent assistants using
click, touch, and voice interactions?
What is user satisfaction?
Evaluation Personal Assistants
Evaluation Personal Assistants
How to define user satisfaction
with search dialogues?
Cortana:
Here are ten
restaurants
near you
Cortana:
Here are ten
restaurants near
you that have
good reviews
Cortana:
Getting you
direction to the
Mayuri Indian
Cuisine
User:
show
restauran
ts near
me
User:
show the
best ones
User:
show
directions
to the
second
one
No Clicks
???
Cortana:
Here are ten
restaurants
near you
Cortana:
Here are ten
restaurants near
you that have
good reviews
Cortana:
Getting you
direction to the
Mayuri Indian
Cuisine
User:
show
restauran
ts near
me
User:
show the
best ones
User:
show
directions
to the
second
one
SAT? SAT? SAT?
Overall
SAT?
? SAT? SAT? SAT?
User Frustration
Q1: what's the weather like in San Francisco
Q2: what's the weather like in Mountain View
Q3: can you find me a hotel close to Mountain
View
Q4: can you show me the cheapest ones
Q5: show me the third one
Q6: show me the directions from SFO to this
hotel
Q6: show me the directions from SFO to this
hotel
Q7: go back to first hotel (misrecognition)
Q8: show me hotels in Mountain View
Q9: show me cheap hotels in Mountain View
Q10: show me more about the third one


Dialog with
Intelligent Assistant
Task is Planning a
weekend 
RestartsearchAuserissatisfied
What interaction signals can
track during search dialogues?
Tracking User Interaction:
Phonetic Similarity
Phonetic Similarity
between consecutive requests
Tracking User Interaction
3 seconds 6 seconds
33% of
ViewPort
66% of
ViewPort
ViewPortHeight
2 seconds
20% of
ViewPort
1s 4s 0.4s 5.4s+ + =
Tracking User Interaction
Evaluation Personal Assistants
 Number of Swipes
 Number of up-swipes
 Number of down-swipes
 Total distance swiped (pixels)
 Number of swipes normalized by
time
 Total distance divided by num. of
swipes
 Total swiped distance divided by
time
 Number of swipe direction
changes
 SERP answer duration (seconds)
which is shown on screen (even
partially)
 Fraction of visible pixels belonging
to SERP answer
 Attributed time (seconds) to viewing
a particular element (answer) on
SERP
 Attributed time (seconds) per unit
height (pixels) associated with a
particular element on SERP
 Attributed time (milliseconds) per
unit area (square pixels) associated
with a particular element on SERP
Tracking User Interaction:
Touch Signals
Evaluation Personal Assistants
Quality of Interaction Model
Method Accuracy (%) Average F1 (%)
Baseline 70.62 61.38
Interaction Model 80.81*
(14.43)
79.08*
(28.83)
* Statistically significant improvement (p < 0,05 )
How current prediction of user
satisfaction can be improved?
Cepstrum: Normal Voice
Cepstrum: Angry Voice
Normal vs Angry
Normal Voice
Angry Voice
Changes in User Emotions
ti
ti+1
Emotion
State
Emotion
State
Changes in User Emotions
ti
ti+1
Emotion
State
Emotion
State
SAT
DSAT
Evaluation Personal Assistants
How to define a situational user
satisfaction?
User Situation Matters
Cortana:
Here are ten
restaurants
near you
Cortana:
Here are ten
restaurants near
you that have
good reviews
Cortana:
Getting you
direction to the
Mayuri Indian
Cuisine
User:
show
restauran
ts near
me
User:
show the
best ones
User:
show
directions
to the
second
one
From Queries to Dialogues:
Sequential Interaction
Tendency Toward Direct Answers
Tendency Toward Direct Answers
User-System Interaction Interface
User-System Interaction Interface
How to restore the user reward function?
Inverse Reinforcement Learning
[P. Abbeels slides on IRL]
 User satisfaction with personal assistants is defined in the generalized
form, which showed understanding the nature of user satisfaction as an
aggregation of satisfaction with all dialogues tasks and not as a
satisfaction with all dialogues queries separately
 We showed that features derived from voice and especially from touch
and voice interactions add significant gain in accuracy over the baseline
 We proposed a novel and dynamic approach to restore user reward
function
Thank you!
Questions?

More Related Content

Evaluation Personal Assistants

  • 1. Julia Kiseleva @julia_kiseleva UserSat.com University of Amsterdam Evaluating Personal Assistants
  • 5. It brings us new challenges Google at SIGIR 2016
  • 7. From Queries to Dialogues Q1: how is the weather in Chicago Q2: how is it this weekend Q3: find me hotels Q4: which one of these is the cheapest Q5: which one of these has at least 4 stars Q6: find me directions from the Chicago airport to number one Users dialogue with Cortana: Task is Finding a hotel in Chicago
  • 8. From Queries to Dialogues Q1: find me a pharmacy nearby Q2: which of these is highly rated Q3: show more information about number 2 Q4: how long will it take me to get there Q5: Thanks Users dialogue with Cortana: Task is Finding a pharmacy
  • 9. Main Research Question How can we automatically predict user satisfaction with search dialogues on intelligent assistants using click, touch, and voice interactions?
  • 10. What is user satisfaction?
  • 13. How to define user satisfaction with search dialogues?
  • 14. Cortana: Here are ten restaurants near you Cortana: Here are ten restaurants near you that have good reviews Cortana: Getting you direction to the Mayuri Indian Cuisine User: show restauran ts near me User: show the best ones User: show directions to the second one No Clicks ???
  • 15. Cortana: Here are ten restaurants near you Cortana: Here are ten restaurants near you that have good reviews Cortana: Getting you direction to the Mayuri Indian Cuisine User: show restauran ts near me User: show the best ones User: show directions to the second one SAT? SAT? SAT? Overall SAT? ? SAT? SAT? SAT?
  • 16. User Frustration Q1: what's the weather like in San Francisco Q2: what's the weather like in Mountain View Q3: can you find me a hotel close to Mountain View Q4: can you show me the cheapest ones Q5: show me the third one Q6: show me the directions from SFO to this hotel Q6: show me the directions from SFO to this hotel Q7: go back to first hotel (misrecognition) Q8: show me hotels in Mountain View Q9: show me cheap hotels in Mountain View Q10: show me more about the third one Dialog with Intelligent Assistant Task is Planning a weekend RestartsearchAuserissatisfied
  • 17. What interaction signals can track during search dialogues?
  • 18. Tracking User Interaction: Phonetic Similarity Phonetic Similarity between consecutive requests
  • 20. 3 seconds 6 seconds 33% of ViewPort 66% of ViewPort ViewPortHeight 2 seconds 20% of ViewPort 1s 4s 0.4s 5.4s+ + = Tracking User Interaction
  • 22. Number of Swipes Number of up-swipes Number of down-swipes Total distance swiped (pixels) Number of swipes normalized by time Total distance divided by num. of swipes Total swiped distance divided by time Number of swipe direction changes SERP answer duration (seconds) which is shown on screen (even partially) Fraction of visible pixels belonging to SERP answer Attributed time (seconds) to viewing a particular element (answer) on SERP Attributed time (seconds) per unit height (pixels) associated with a particular element on SERP Attributed time (milliseconds) per unit area (square pixels) associated with a particular element on SERP Tracking User Interaction: Touch Signals
  • 24. Quality of Interaction Model Method Accuracy (%) Average F1 (%) Baseline 70.62 61.38 Interaction Model 80.81* (14.43) 79.08* (28.83) * Statistically significant improvement (p < 0,05 )
  • 25. How current prediction of user satisfaction can be improved?
  • 28. Normal vs Angry Normal Voice Angry Voice
  • 29. Changes in User Emotions ti ti+1 Emotion State Emotion State
  • 30. Changes in User Emotions ti ti+1 Emotion State Emotion State SAT DSAT
  • 32. How to define a situational user satisfaction?
  • 34. Cortana: Here are ten restaurants near you Cortana: Here are ten restaurants near you that have good reviews Cortana: Getting you direction to the Mayuri Indian Cuisine User: show restauran ts near me User: show the best ones User: show directions to the second one From Queries to Dialogues: Sequential Interaction
  • 38. User-System Interaction Interface How to restore the user reward function?
  • 39. Inverse Reinforcement Learning [P. Abbeels slides on IRL]
  • 40. User satisfaction with personal assistants is defined in the generalized form, which showed understanding the nature of user satisfaction as an aggregation of satisfaction with all dialogues tasks and not as a satisfaction with all dialogues queries separately We showed that features derived from voice and especially from touch and voice interactions add significant gain in accuracy over the baseline We proposed a novel and dynamic approach to restore user reward function Thank you! Questions?

Editor's Notes

  • #19: We utilize acoustic feature to characterize voice interaction happening in search dialogues. More specifically, we use the phonetic similarity between consecutive requests to identify patterns of repetition. Metaphone representation [39] is a way of indexing words by their pronunciation that allows us to represent words by how they are pronounced as opposed to how they are written.
  • #21: Consider movie recommendation