際際滷

際際滷Share a Scribd company logo
Recommendations
and feedback
The user-experience of a
recommender system




                           Where innovation starts
Acknowledgements
Martijn Willemsen
Eindhoven University of Technology


Stefan Hirtbach
European Microsoft Innovation Center GmbH



MyMedia
European Commission FP7 project
Beyond algorithms
Two premises for succesful
recommender systems




                             Where innovation starts
Recommender systems
Recommend items to users
based on their stated preferences
(e.g. books, movies, laptops)


Users indicate preferences
by rating presented items
(e.g. from one to 鍖ve stars)


Predict the users rating value of new items...
then present items with the highest predicted rating
Current situation



 More   Better   Better
                 experience
Two premises
Premise 1 | Users want to receive
recommendations
Do recommendations have any effect on the user experience at all?
Compare a system with vs. without recommendations


Premise 2 | Users will provide preference
feedback
Without feedback, no recommendations
What causes - and inhibits - them to do this?
Analyze users feedback behavior and intentions
Evaluating the
user experience
Hypotheses based on
existing research




                      Where innovation starts
Effect of
Premise 1 | Users want to receive
recommendations

Users are able to notice differences in prediction
accuracy
But... higher accuracy can lead to lower usefulness of
recommendations


Distinction between perception and evaluation
of recommendation quality
Constructs and
Perception
Perceived recommendation
quality
                                                                                User experience


Evaluation                       Personalized vs.
                                     random
                                                                 H2a        +      Choice
                                                                                 satisfaction
Choice satisfaction                       H1    +   Perceived recom-
Perceived system effectiveness                      mendation quality

                                                                            Perceived system
                                                                 H2b    +     e ectiveness

Questionnaires and
process data
Feedback
Premise 2 | Users will provide preference
feedback

Satisfaction increases feedback intentions
However, only a minority is willing to give up personal information
in return for a personalized experience (Teltzrow & Kobsa)


Privacy decreases feedback intentions
However, most people are usually or always comfortable disclosing
personal taste preferences (Ackerman et al.)
Constructs and
Feedback
Willingness to provide feedback
                                                 User experience


                                                                         H3a
Privacy                                             Choice
                                                  satisfaction

System-speci鍖c privacy
concerns                                                                               +
                                                Perceived system                Intention to
Trust in technology                               e ectiveness     H3b   +    provide feedback




Process data               General trust
                           in technology
                                           H4   System-speci鍖c
                                                privacy concerns
                                                                         H5

Actual feedback behavior
A model of user

                                               User experience

Personalized vs.
    random
                                H2a        +      Choice               H3a
                                                satisfaction
         H1    +   Perceived recom-
                   mendation quality                                                 +
                                           Perceived system                   Intention to
                                H2b    +     e ectiveness        H3b   +    provide feedback




              General trust     H4         System-speci鍖c              H5
              in technology                privacy concerns
Experiment
Test
with actual recommender system


Two versions of the              Personalized vs.
                                                                                User experience


                                     random
                                                                 H2a        +      Choice               H3a

system:                                   H1    +   Perceived recom-
                                                                                 satisfaction


                                                    mendation quality                                                 +
One that provides personalized                                              Perceived system                   Intention to
                                                                                                        +
recommendations                                                  H2b    +     e ectiveness        H3b        provide feedback



One that provides random clips                 General trust     H4         System-speci鍖c              H5
as recommendations                           in technology                privacy concerns
An online
experiment
Testing the hypotheses using
the Microsoft ClipClub
system




                           Where innovation starts
Setup
Online experiment
Conducted by EMIC in Germany,
September and October, 2009
Two slightly modi鍖ed versions
of
the MSN ClipClub system


43 participants
25 in the random and 18 in the
personalized condition
65% male, all German
Average age of 31 (SD = 9.45)
System
Microsoft ClipClub
Lifestyle & entertainment video
clips


Changes
Recommendations section
highlighted
Pre-experimental instruction


Rating probe
No rating for 鍖ve minutes: ask
user to rate the current item
Employed algorithm
Vector Space Model Engine
Use the tags associated to a clip to create a vector of each clip
Create a tag vector for the subset of clips rated by the user
Recommends clips with a tag vector similar to the created tag vector
Older ratings are logarithmically discounted, as are older items
Experimental procedure
Each participant:
entered demographic details
was shown an instruction on how to use the system
used the system freely for at least 30 minutes
completed the questionnaires
entered an email address for the raffle


Rating items
Users could perpetually rate items and inspect recommendations in
any given order
Rating probe: at least 6 ratings unless ignored
Questionnaires
40 statements                                 Choice satisfaction
                                              9 items, e.g. The videos I chose 鍖tted my
Agree or disagree on a 5-point                preference
scale
                                              General trust in technology
Factor Analysis in two batches                4 items, e.g. Im less con鍖dent when I use
                                              technology, reverse-coded

                                              System-speci鍖c privacy concern
6 factors                                     5 items, e.g. I feel con鍖dent that ClipClub
Recommendation set quality                    respects my privacy
7 items, e.g. The recommended videos 鍖tted   Intention to rate items
my preference
                                              5 items, e.g. I like to give feedback on the
System effectiveness                          items Im watching
6 items, e.g. The recommender is useless,
reverse-coded
Process data
All clicks were logged
In order to link subjective metrics to observable behavior


Process data measures
Total viewing-time
Number of clicked clips
Number of completed clips
Number of self-initiated ratings
Number of canceled rating requests
Results
Back to the path model




                         Where innovation starts
Path model results

Personalized vs.
                              .572 (.125)***      Choice             .346 (.125)**
    random
                                   H2a          satisfaction             H3a
         .696 (.276)*   Perceived recom-
             H1         mendation quality

                                               Perceived system                Intention to
                              .515 (.135)***     e ectiveness   .296 (.123)* provide feedback
                                   H2b                             H3b


              General trust    -.268 (.156)1   System-speci鍖c        -.255 (.113)*
              in technology        H4          privacy concerns           H5
Effect of
                                 Personalized vs.
                                                               .572 (.125)***      Choice
                                     random
Users notice                              .696 (.276)*
                                                                    H2a          satisfaction

                                                         Perceived recom-
personalization                               H1         mendation quality

                                                                                Perceived system
Personalized recommendations                                   .515 (.135)***     e ectiveness
increase perceived                                                  H2b

recommendation quality (H1)
                                    Users browse less, but
Users like better                   watch more
                                    Number of clips watched
recommendations                     entirely is higher in the
Higher perceived quality            personalized condition
increases choice satisfaction       Number of clicked clips and
(H2a) and system effectiveness      total viewing time are negatively
(H2b)                               correlated with system
Feedback
                                                               Choice             .346 (.125)**

Better experience                                            satisfaction             H3a


increases feedback
                                                            Perceived system                Intention to
Choice satisfaction and system                                e ectiveness   .296 (.123)* provide feedback
effectiveness increase feedback                                                 H3b

intentions (H3a,b)
                            General trust   -.268 (.156)1   System-speci鍖c        -.255 (.113)*
                            in technology       H4          privacy concerns           H5

Privacy decreases                       Effect of trust in
feedback                                technology
Users with a higher system-             Privacy concerns increase when
speci鍖c privacy concern have a          users have a lower trust in
lower feedback intention (H5)           technology (H4).
Intention-behavior gap
Number of canceled rating probes
Signi鍖cantly lower in the personalized condition
Negatively correlated with intention to provide feedback


Total number of provided ratings
Not signi鍖cantly correlated with users intention to provide feedback
To summarize...

Personalized vs.
                              .572 (.125)***      Choice             .346 (.125)**
    random
                                   H2a          satisfaction             H3a
         .696 (.276)*   Perceived recom-
             H1         mendation quality

                                               Perceived system                Intention to
                              .515 (.135)***     e ectiveness   .296 (.123)* provide feedback
                                   H2b                             H3b


              General trust    -.268 (.156)1   System-speci鍖c        -.255 (.113)*
              in technology        H4          privacy concerns           H5
L%3'-&)M&%<("80")            N+%-*")0%'()




                                                         9"+':-%#)
                             ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@)


                                 IJ?"#-"&*")
Future work
A21B"*.:")$=$,"3)              ;%<)C)?"#*"-:"),+")
     '$?"*,$)                     -&,"#'*.%&)                                C&,"#'*.%&)
;%<)C)?"#*"-:"),+")$=$,"3)                                                /+")%1B"*.:")"E"*,)%@)
                              ;"8%&-*)"J?"#-"&*")
                                                                            2$-&0),+")$=$,"3)
   C&,"#'*.%&)2$'1-(-,=)
Lessons learned, new ideas
                    K$"@2(&"$$)                                                  !2#*+'$"7:-"<)
    !"#*"-:"8)D2'(-,=)
                                      /#2$,)                                      A=$,"3)2$")
         O??"'()
                              F2,*%3")":'(2'.%&)




                                                !"#$%&'()*+'#'*,"#-$.*$)
                                                 /+-&0$)'1%2,)3")4,+',)3'5"#6)

                              /#2$,78-$,#2$,)         Where innovation starts N%&,#%()
                                                       A%*-'()@'*,%#$)
Remaining questions
True for all recommender systems?
Results should be con鍖rmed in several other systems and with a
higher number and a more diverse range of participants


Other in鍖uences?
Incorporate other aspects to get a more detailed understanding of
the mechanisms underlying the user-recommender interaction


Other algorithms?
Test differences between algorithms that only moderately differ in
accuracy
Consider a framework
                                                                        A-,2'.%&'()*+'#'*,"#-$.*$)
                                                                    /+-&0$)'1%2,),+")$-,2'.%&)4,+',)3'5"#6)

                                                                 L%3'-&)M&%<("80")                 N+%-*")0%'()




                                                                                 9"+':-%#)
                                                     ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@)

F1B"*.:")$=$,"3)                                         IJ?"#-"&*")
    '$?"*,$)            A21B"*.:")$=$,"3)              ;%<)C)?"#*"-:"),+")
G+',),+")$=$,"3)8%"$)        '$?"*,$)                     -&,"#'*.%&)                                     C&,"#'*.%&)
                        ;%<)C)?"#*"-:"),+")$=$,"3)                                                     /+")%1B"*.:")"E"*,)%@)
 P"*%33"&8'.%&$)                                      ;"8%&-*)"J?"#-"&*")
                                                                                                         2$-&0),+")$=$,"3)
                           C&,"#'*.%&)2$'1-(-,=)
     C&,"#'*.%&)                                           K$"@2(&"$$)                                    !2#*+'$"7:-"<)
                            !"#*"-:"8)D2'(-,=)
    N'?'1-(-."$)                                              /#2$,)                                        A=$,"3)2$")
                                 O??"'()
  H2'(-,=)%@)'$$",$)                                  F2,*%3")":'(2'.%&)




                                                                        !"#$%&'()*+'#'*,"#-$.*$)
                                                                         /+-&0$)'1%2,)3")4,+',)3'5"#6)

                                                      /#2$,78-$,#2$,)            A%*-'()@'*,%#$)                  N%&,#%()
Field trails
Full-scale test of the framework
Four different partners, three different countries
Trials are conducted over a longer time-period
Each compares at least three systems (mainly different algorithms)
Questionnaires and process data


Core of evaluation is the same
Algorithm -> perceived recommendation quality -> system
effectiveness
Each partner adds measures of personal interest
Want more?
RecSys10 workshop
User-Centric Evaluation of Recommender




                                                             attending
Systems and their Interfaces (UCERSTI)
Barcelona, September 26-30




                                                                  I am
Line-up:
7 paper presentations                             !"#$%&'
2 keynotes (Francisco Martin, Pearl Pu)
Panel discussion with 5 prominent researchers
                                                1st internation
                                                               al workshop on
                                                 User-Centric E
                                                                 valuation of
                                                  Recommender
                                                                   Systems
                                                     and Their Inte
                                                                    rfaces

More Related Content

Viewers also liked (9)

Big data - A critical appraisal
Big data - A critical appraisalBig data - A critical appraisal
Big data - A critical appraisal
Bart Knijnenburg
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure JustificationsPrivacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Bart Knijnenburg
Information Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and RecommendationInformation Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and Recommendation
Bart Knijnenburg
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive SolutionsSimplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Bart Knijnenburg
Explaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User ExperimentsExplaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User Experiments
Bart Knijnenburg
Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...
Bart Knijnenburg
Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?
Bart Knijnenburg
Cohabitation
CohabitationCohabitation
Cohabitation
rosita_rdgz
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
Bart Knijnenburg
Big data - A critical appraisal
Big data - A critical appraisalBig data - A critical appraisal
Big data - A critical appraisal
Bart Knijnenburg
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure JustificationsPrivacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Bart Knijnenburg
Information Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and RecommendationInformation Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and Recommendation
Bart Knijnenburg
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive SolutionsSimplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Bart Knijnenburg
Explaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User ExperimentsExplaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User Experiments
Bart Knijnenburg
Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...
Bart Knijnenburg
Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?
Bart Knijnenburg
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
Bart Knijnenburg

Recently uploaded (20)

Caching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory DatastoreCaching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory Datastore
ScyllaDB
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIATHE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
Srivaanchi Nathan
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
AMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarAMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes Webinar
ThousandEyes
Temporary Compound microscope slide .pptx
Temporary Compound microscope slide .pptxTemporary Compound microscope slide .pptx
Temporary Compound microscope slide .pptx
Samir Sharma
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
Data-Driven Public Safety: Reliable Data When Every Second Counts
Data-Driven Public Safety: Reliable Data When Every Second CountsData-Driven Public Safety: Reliable Data When Every Second Counts
Data-Driven Public Safety: Reliable Data When Every Second Counts
Safe Software
5 Best Agentic AI Frameworks for 2025.pdf
5 Best Agentic AI Frameworks for 2025.pdf5 Best Agentic AI Frameworks for 2025.pdf
5 Best Agentic AI Frameworks for 2025.pdf
SoluLab1231
Predictive vs. Preventive Maintenance Which One is Right for Your Factory
Predictive vs. Preventive Maintenance  Which One is Right for Your FactoryPredictive vs. Preventive Maintenance  Which One is Right for Your Factory
Predictive vs. Preventive Maintenance Which One is Right for Your Factory
Diagsense ltd
Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025
Anthony Dahanne
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meterWebinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
DanBrown980551
2025-02-27 Tech & Play_ Fun, UX, and Community.pdf
2025-02-27 Tech & Play_ Fun, UX, and Community.pdf2025-02-27 Tech & Play_ Fun, UX, and Community.pdf
2025-02-27 Tech & Play_ Fun, UX, and Community.pdf
katalinjordans1
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great ProductGDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
James Anderson
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
Teaching Prompting and Prompt Sharing to End Users.pptx
Teaching Prompting and Prompt Sharing to End Users.pptxTeaching Prompting and Prompt Sharing to End Users.pptx
Teaching Prompting and Prompt Sharing to End Users.pptx
Michael Blumenthal (Microsoft MVP)
Deno ...................................
Deno ...................................Deno ...................................
Deno ...................................
Robert MacLean
5 Must-Use AI Tools to Supercharge Your Productivity
5 Must-Use AI Tools to Supercharge Your Productivity5 Must-Use AI Tools to Supercharge Your Productivity
5 Must-Use AI Tools to Supercharge Your Productivity
cryptouniversityoffi
Transcript: AI in publishing: Your questions answered - Tech Forum 2025
Transcript: AI in publishing: Your questions answered - Tech Forum 2025Transcript: AI in publishing: Your questions answered - Tech Forum 2025
Transcript: AI in publishing: Your questions answered - Tech Forum 2025
BookNet Canada
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
Caching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory DatastoreCaching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory Datastore
ScyllaDB
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIATHE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
Srivaanchi Nathan
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
AMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarAMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes Webinar
ThousandEyes
Temporary Compound microscope slide .pptx
Temporary Compound microscope slide .pptxTemporary Compound microscope slide .pptx
Temporary Compound microscope slide .pptx
Samir Sharma
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
Data-Driven Public Safety: Reliable Data When Every Second Counts
Data-Driven Public Safety: Reliable Data When Every Second CountsData-Driven Public Safety: Reliable Data When Every Second Counts
Data-Driven Public Safety: Reliable Data When Every Second Counts
Safe Software
5 Best Agentic AI Frameworks for 2025.pdf
5 Best Agentic AI Frameworks for 2025.pdf5 Best Agentic AI Frameworks for 2025.pdf
5 Best Agentic AI Frameworks for 2025.pdf
SoluLab1231
Predictive vs. Preventive Maintenance Which One is Right for Your Factory
Predictive vs. Preventive Maintenance  Which One is Right for Your FactoryPredictive vs. Preventive Maintenance  Which One is Right for Your Factory
Predictive vs. Preventive Maintenance Which One is Right for Your Factory
Diagsense ltd
Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025
Anthony Dahanne
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meterWebinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
DanBrown980551
2025-02-27 Tech & Play_ Fun, UX, and Community.pdf
2025-02-27 Tech & Play_ Fun, UX, and Community.pdf2025-02-27 Tech & Play_ Fun, UX, and Community.pdf
2025-02-27 Tech & Play_ Fun, UX, and Community.pdf
katalinjordans1
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great ProductGDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
James Anderson
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
Deno ...................................
Deno ...................................Deno ...................................
Deno ...................................
Robert MacLean
5 Must-Use AI Tools to Supercharge Your Productivity
5 Must-Use AI Tools to Supercharge Your Productivity5 Must-Use AI Tools to Supercharge Your Productivity
5 Must-Use AI Tools to Supercharge Your Productivity
cryptouniversityoffi
Transcript: AI in publishing: Your questions answered - Tech Forum 2025
Transcript: AI in publishing: Your questions answered - Tech Forum 2025Transcript: AI in publishing: Your questions answered - Tech Forum 2025
Transcript: AI in publishing: Your questions answered - Tech Forum 2025
BookNet Canada
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel

Recommendations and Feedback - The user-experience of a recommender system

  • 1. Recommendations and feedback The user-experience of a recommender system Where innovation starts
  • 2. Acknowledgements Martijn Willemsen Eindhoven University of Technology Stefan Hirtbach European Microsoft Innovation Center GmbH MyMedia European Commission FP7 project
  • 3. Beyond algorithms Two premises for succesful recommender systems Where innovation starts
  • 4. Recommender systems Recommend items to users based on their stated preferences (e.g. books, movies, laptops) Users indicate preferences by rating presented items (e.g. from one to 鍖ve stars) Predict the users rating value of new items... then present items with the highest predicted rating
  • 5. Current situation More Better Better experience
  • 6. Two premises Premise 1 | Users want to receive recommendations Do recommendations have any effect on the user experience at all? Compare a system with vs. without recommendations Premise 2 | Users will provide preference feedback Without feedback, no recommendations What causes - and inhibits - them to do this? Analyze users feedback behavior and intentions
  • 7. Evaluating the user experience Hypotheses based on existing research Where innovation starts
  • 8. Effect of Premise 1 | Users want to receive recommendations Users are able to notice differences in prediction accuracy But... higher accuracy can lead to lower usefulness of recommendations Distinction between perception and evaluation of recommendation quality
  • 9. Constructs and Perception Perceived recommendation quality User experience Evaluation Personalized vs. random H2a + Choice satisfaction Choice satisfaction H1 + Perceived recom- Perceived system effectiveness mendation quality Perceived system H2b + e ectiveness Questionnaires and process data
  • 10. Feedback Premise 2 | Users will provide preference feedback Satisfaction increases feedback intentions However, only a minority is willing to give up personal information in return for a personalized experience (Teltzrow & Kobsa) Privacy decreases feedback intentions However, most people are usually or always comfortable disclosing personal taste preferences (Ackerman et al.)
  • 11. Constructs and Feedback Willingness to provide feedback User experience H3a Privacy Choice satisfaction System-speci鍖c privacy concerns + Perceived system Intention to Trust in technology e ectiveness H3b + provide feedback Process data General trust in technology H4 System-speci鍖c privacy concerns H5 Actual feedback behavior
  • 12. A model of user User experience Personalized vs. random H2a + Choice H3a satisfaction H1 + Perceived recom- mendation quality + Perceived system Intention to H2b + e ectiveness H3b + provide feedback General trust H4 System-speci鍖c H5 in technology privacy concerns
  • 13. Experiment Test with actual recommender system Two versions of the Personalized vs. User experience random H2a + Choice H3a system: H1 + Perceived recom- satisfaction mendation quality + One that provides personalized Perceived system Intention to + recommendations H2b + e ectiveness H3b provide feedback One that provides random clips General trust H4 System-speci鍖c H5 as recommendations in technology privacy concerns
  • 14. An online experiment Testing the hypotheses using the Microsoft ClipClub system Where innovation starts
  • 15. Setup Online experiment Conducted by EMIC in Germany, September and October, 2009 Two slightly modi鍖ed versions of the MSN ClipClub system 43 participants 25 in the random and 18 in the personalized condition 65% male, all German Average age of 31 (SD = 9.45)
  • 16. System Microsoft ClipClub Lifestyle & entertainment video clips Changes Recommendations section highlighted Pre-experimental instruction Rating probe No rating for 鍖ve minutes: ask user to rate the current item
  • 17. Employed algorithm Vector Space Model Engine Use the tags associated to a clip to create a vector of each clip Create a tag vector for the subset of clips rated by the user Recommends clips with a tag vector similar to the created tag vector Older ratings are logarithmically discounted, as are older items
  • 18. Experimental procedure Each participant: entered demographic details was shown an instruction on how to use the system used the system freely for at least 30 minutes completed the questionnaires entered an email address for the raffle Rating items Users could perpetually rate items and inspect recommendations in any given order Rating probe: at least 6 ratings unless ignored
  • 19. Questionnaires 40 statements Choice satisfaction 9 items, e.g. The videos I chose 鍖tted my Agree or disagree on a 5-point preference scale General trust in technology Factor Analysis in two batches 4 items, e.g. Im less con鍖dent when I use technology, reverse-coded System-speci鍖c privacy concern 6 factors 5 items, e.g. I feel con鍖dent that ClipClub Recommendation set quality respects my privacy 7 items, e.g. The recommended videos 鍖tted Intention to rate items my preference 5 items, e.g. I like to give feedback on the System effectiveness items Im watching 6 items, e.g. The recommender is useless, reverse-coded
  • 20. Process data All clicks were logged In order to link subjective metrics to observable behavior Process data measures Total viewing-time Number of clicked clips Number of completed clips Number of self-initiated ratings Number of canceled rating requests
  • 21. Results Back to the path model Where innovation starts
  • 22. Path model results Personalized vs. .572 (.125)*** Choice .346 (.125)** random H2a satisfaction H3a .696 (.276)* Perceived recom- H1 mendation quality Perceived system Intention to .515 (.135)*** e ectiveness .296 (.123)* provide feedback H2b H3b General trust -.268 (.156)1 System-speci鍖c -.255 (.113)* in technology H4 privacy concerns H5
  • 23. Effect of Personalized vs. .572 (.125)*** Choice random Users notice .696 (.276)* H2a satisfaction Perceived recom- personalization H1 mendation quality Perceived system Personalized recommendations .515 (.135)*** e ectiveness increase perceived H2b recommendation quality (H1) Users browse less, but Users like better watch more Number of clips watched recommendations entirely is higher in the Higher perceived quality personalized condition increases choice satisfaction Number of clicked clips and (H2a) and system effectiveness total viewing time are negatively (H2b) correlated with system
  • 24. Feedback Choice .346 (.125)** Better experience satisfaction H3a increases feedback Perceived system Intention to Choice satisfaction and system e ectiveness .296 (.123)* provide feedback effectiveness increase feedback H3b intentions (H3a,b) General trust -.268 (.156)1 System-speci鍖c -.255 (.113)* in technology H4 privacy concerns H5 Privacy decreases Effect of trust in feedback technology Users with a higher system- Privacy concerns increase when speci鍖c privacy concern have a users have a lower trust in lower feedback intention (H5) technology (H4).
  • 25. Intention-behavior gap Number of canceled rating probes Signi鍖cantly lower in the personalized condition Negatively correlated with intention to provide feedback Total number of provided ratings Not signi鍖cantly correlated with users intention to provide feedback
  • 26. To summarize... Personalized vs. .572 (.125)*** Choice .346 (.125)** random H2a satisfaction H3a .696 (.276)* Perceived recom- H1 mendation quality Perceived system Intention to .515 (.135)*** e ectiveness .296 (.123)* provide feedback H2b H3b General trust -.268 (.156)1 System-speci鍖c -.255 (.113)* in technology H4 privacy concerns H5
  • 27. L%3'-&)M&%<("80") N+%-*")0%'() 9"+':-%#) ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@) IJ?"#-"&*") Future work A21B"*.:")$=$,"3) ;%<)C)?"#*"-:"),+") '$?"*,$) -&,"#'*.%&) C&,"#'*.%&) ;%<)C)?"#*"-:"),+")$=$,"3) /+")%1B"*.:")"E"*,)%@) ;"8%&-*)"J?"#-"&*") 2$-&0),+")$=$,"3) C&,"#'*.%&)2$'1-(-,=) Lessons learned, new ideas K$"@2(&"$$) !2#*+'$"7:-"<) !"#*"-:"8)D2'(-,=) /#2$,) A=$,"3)2$") O??"'() F2,*%3")":'(2'.%&) !"#$%&'()*+'#'*,"#-$.*$) /+-&0$)'1%2,)3")4,+',)3'5"#6) /#2$,78-$,#2$,) Where innovation starts N%&,#%() A%*-'()@'*,%#$)
  • 28. Remaining questions True for all recommender systems? Results should be con鍖rmed in several other systems and with a higher number and a more diverse range of participants Other in鍖uences? Incorporate other aspects to get a more detailed understanding of the mechanisms underlying the user-recommender interaction Other algorithms? Test differences between algorithms that only moderately differ in accuracy
  • 29. Consider a framework A-,2'.%&'()*+'#'*,"#-$.*$) /+-&0$)'1%2,),+")$-,2'.%&)4,+',)3'5"#6) L%3'-&)M&%<("80") N+%-*")0%'() 9"+':-%#) ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@) F1B"*.:")$=$,"3) IJ?"#-"&*") '$?"*,$) A21B"*.:")$=$,"3) ;%<)C)?"#*"-:"),+") G+',),+")$=$,"3)8%"$) '$?"*,$) -&,"#'*.%&) C&,"#'*.%&) ;%<)C)?"#*"-:"),+")$=$,"3) /+")%1B"*.:")"E"*,)%@) P"*%33"&8'.%&$) ;"8%&-*)"J?"#-"&*") 2$-&0),+")$=$,"3) C&,"#'*.%&)2$'1-(-,=) C&,"#'*.%&) K$"@2(&"$$) !2#*+'$"7:-"<) !"#*"-:"8)D2'(-,=) N'?'1-(-."$) /#2$,) A=$,"3)2$") O??"'() H2'(-,=)%@)'$$",$) F2,*%3")":'(2'.%&) !"#$%&'()*+'#'*,"#-$.*$) /+-&0$)'1%2,)3")4,+',)3'5"#6) /#2$,78-$,#2$,) A%*-'()@'*,%#$) N%&,#%()
  • 30. Field trails Full-scale test of the framework Four different partners, three different countries Trials are conducted over a longer time-period Each compares at least three systems (mainly different algorithms) Questionnaires and process data Core of evaluation is the same Algorithm -> perceived recommendation quality -> system effectiveness Each partner adds measures of personal interest
  • 31. Want more? RecSys10 workshop User-Centric Evaluation of Recommender attending Systems and their Interfaces (UCERSTI) Barcelona, September 26-30 I am Line-up: 7 paper presentations !"#$%&' 2 keynotes (Francisco Martin, Pearl Pu) Panel discussion with 5 prominent researchers 1st internation al workshop on User-Centric E valuation of Recommender Systems and Their Inte rfaces

Editor's Notes

  • #3: First I want to thank my co-authors and sponsor
  • #5: Your typical recommender system works like this:
  • #6: Right now, researchers seem to focus on the algorithmic performance. They believe that better algorithms lead to a better experience. Is that really true?
  • #7: It can only be true under two assumptions: 1. users want to get personalized recommendations, and 2. they will provide enough feedback to make this possible In order to answer these questions, we need to evaluate the user experience, not the algorithm!
  • #9: What existing evidence do we have? Increased recommendation accuracy is noticeable, but doesn&amp;#x2019;t always lead to a better UX McNee et al.: algorithm with best predictions was rated least helpful Torres et al.: algorithm with lowest accuracy resulted in highest satisfaction Ziegler et al.: diversifying recommendation set resulted in lower accuracy but a more positive evaluation
  • #10: Let&amp;#x2019;s say we have two systems, one with personalized recommendations, and one without: Perception tests whether we are able to notice the difference Evaluation tests whether this increases our satisfaction with the system and, ultimately, our choices These are measures by questionnaires, but we can also look at process data: Effective systems may show decreased browsing and overall viewing time In better systems, users will watch more clips from beginning to end
  • #11: The more beneficial it seems to be, the more feedback users will provide (Spiekermann et al.; Brodie Karat &amp; Karat; Kobsa &amp; Teltzrow) Minority = Between 40 and 50% in an overview of privacy surveys Privacy concerns reduce users&amp;#x2019; willingness to disclose personal information (Metzger et al.; Teltzrow &amp; Kobsa) Most people = 80% of the respondents of a detailed survey Users&amp;#x2019; actual feedback behavior may be different from their intentions (Spiekermann et al.)
  • #12: So now we look at why users provide preference information We already know choice satisfaction and perceived system effectiveness, and we hypothesize that a better experience increase the intention to provide feedback However, privacy concerns may reduce feedback intention, and privacy concerns may be higher for those who don&amp;#x2019;t trust technology in general Process data: Due to the intention-behavior gap actual feedback may only be moderately correlated to feedback intentions
  • #13: So let&amp;#x2019;s review the hypotheses (laser-point): Personalized recommendations should have a perceivably higher quality This should in turn increase the user experience of the system and the outcome (choices) A better experience in turn increases their intention to provide feedback However...
  • #14: Tip: use two conditions to control the causal relations and to single out the effect Also: log behavioral data and triangulate this with the constructs
  • #17: Content and system are in German To explain the rating feature and its effect on recommendations Opening recommendations before rating any items showed a similar explanation Pps were allowed to close this pop-up without rating After rating, participants were transported to the recommendations
  • #18: (the length of the vector depends on the impact the tags have) (in terms of cosine similarity)
  • #19: Allowing ample opportunity to let their feedback behavior be influenced by their user experience Unless they ignored the rating-probe The median number of ratings per user was 15
  • #20: Tip for UX researchers: you cannot measure UX concepts with a single question. Measurement is far more robust if you construct a scale based on several questions Exploratory Factor Analysis validates the intended conceptual structure Finally, test the model with path analysis (mediation on steroids)
  • #21: 1,2: browsing (bad) 3: consumption (good) 4, 5: feedback
  • #23: The model has a good fit, with a non-significant &amp;#x3C7;2 of 13.210 (df = 13, p = .4317), a CFI of .996 and an RMSEA between 0 and 0.153 (90% confidence interval)
  • #27: Let&amp;#x2019;s review that one more time:
  • #30: We&amp;#x2019;ve been developing a framework for this type of research, and validated it in several field trials --&gt;
  • #31: E.g. Advertisement (MS): Less clips clicked (fewer ads started) but maybe a higher retention (more ads full watched)? Watch out for our future papers!
  • #32: Advantages of fitting a model: steps in between reduce variability!