ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Alan Said, Brijnesh J. Jain, Sahin Albayrak
                                                                                                        {alan, jain, sahin}@dai-lab.de

                                                                                                   CSCW 2013 ¨C San Antonio, TX, USA



  Traditional recommender system evaluation only measures one               In each scenario, different concepts have different
  type of quality, e.g. recommendation accuracy or rating prediction        importance.
  error.
  We propose to evaluate and benchmark additional                           We represent the quality of an algorithm as a function E
  recommendation qualities:                                                 - a vector of cost functions:
  ? User Requirements                                                                                                        ?
     ? recommendation accuracy                                                     ? ? =          ?1 ? , ¡­ , ? ? ?
                                      Business Models




     ? perceived quality, etc.
  ? Business Values                                                         In order to allow for simple comparison, we formulate
     ? Retention                                                            the utility function as:
     ? Churn, etc.
  ? Technical Constraints                                                      ? ? = ??? ? =                          ?? ?? ?
     ? Scalability                          User Requirements                                                   ?
     ? Speed, etc.                                                          where w is the vector of weights defining the
                                                                            importance of each axis. The resulting value represents
  By defining a recommendation scenario, each of the three factors          the quality of the recommendation algorithm in the
  can be represented by a quality important in the specific use case.       defined use case.




  We conducted a movie recommendation user study with
  132 users providing feedback on 3 recommendation algo-
  rithms. Each user rated a number of movies and got 10
  recommendations provided by one of the 3 algorithms.
  The algorithm were tuned to provide traditional recom-
  mendations, diverse recommendations, or random
  recommendations respectively.
  Users were asked if they would watch the recommended movies (user requirement) and whether they would consider using the
  system again (business value). The technical constraint is represented by the time the algorithm took to recommend movies.




                                                                The results of the user study, shown with different weights, e.g.
                                                                when all three axis are similarly weighted, when the user
                                                                requirements are more important, when the business values are
                                                                more important and finally when the technical constraints are more
                                                                important than the other values.




  We presented a three dimensional model for evaluation          Further explanation of the 3D
  of recommender systems taking user-centric values,             evaluation concept[RUE¡®12, Said et al. 2012]
  technical constraints and business values into
  consideration. The model simplifies the evaluation and
  benchmarking of recommendation algorithms in                                    Poster abstract [CSCW¡®13, Said
  predefined scenarios, e.g. where different qualities of                         et al. 2013b]
  algorithms are sought for.

  We evaluated the model through a user study                    User-Centric Evaluation of a K-Furthest
  comparing 3 different recommendation algorithms and            Neighbor Collaborative Filtering
  presented different interpretations of the obtained            Recommender Algorithm
  qualities.                                                     [CSCW¡¯13, Said et al. 2013a.]
                                                                 Presentation: Wednesday Feb 27, 10AM. Track 4.



Technische Universit?t Berlin                                                                                   www.dai-lab.de

More Related Content

A 3D Approach to Recommender System Evaluation

  • 1. Alan Said, Brijnesh J. Jain, Sahin Albayrak {alan, jain, sahin}@dai-lab.de CSCW 2013 ¨C San Antonio, TX, USA Traditional recommender system evaluation only measures one In each scenario, different concepts have different type of quality, e.g. recommendation accuracy or rating prediction importance. error. We propose to evaluate and benchmark additional We represent the quality of an algorithm as a function E recommendation qualities: - a vector of cost functions: ? User Requirements ? ? recommendation accuracy ? ? = ?1 ? , ¡­ , ? ? ? Business Models ? perceived quality, etc. ? Business Values In order to allow for simple comparison, we formulate ? Retention the utility function as: ? Churn, etc. ? Technical Constraints ? ? = ??? ? = ?? ?? ? ? Scalability User Requirements ? ? Speed, etc. where w is the vector of weights defining the importance of each axis. The resulting value represents By defining a recommendation scenario, each of the three factors the quality of the recommendation algorithm in the can be represented by a quality important in the specific use case. defined use case. We conducted a movie recommendation user study with 132 users providing feedback on 3 recommendation algo- rithms. Each user rated a number of movies and got 10 recommendations provided by one of the 3 algorithms. The algorithm were tuned to provide traditional recom- mendations, diverse recommendations, or random recommendations respectively. Users were asked if they would watch the recommended movies (user requirement) and whether they would consider using the system again (business value). The technical constraint is represented by the time the algorithm took to recommend movies. The results of the user study, shown with different weights, e.g. when all three axis are similarly weighted, when the user requirements are more important, when the business values are more important and finally when the technical constraints are more important than the other values. We presented a three dimensional model for evaluation Further explanation of the 3D of recommender systems taking user-centric values, evaluation concept[RUE¡®12, Said et al. 2012] technical constraints and business values into consideration. The model simplifies the evaluation and benchmarking of recommendation algorithms in Poster abstract [CSCW¡®13, Said predefined scenarios, e.g. where different qualities of et al. 2013b] algorithms are sought for. We evaluated the model through a user study User-Centric Evaluation of a K-Furthest comparing 3 different recommendation algorithms and Neighbor Collaborative Filtering presented different interpretations of the obtained Recommender Algorithm qualities. [CSCW¡¯13, Said et al. 2013a.] Presentation: Wednesday Feb 27, 10AM. Track 4. Technische Universit?t Berlin www.dai-lab.de