ݺߣ

ݺߣShare a Scribd company logo
Recommender Systems

                                                                 Simona Dakova

                                                                          The slides are licensed under a
1   Web Technologies – Prof. Dr. Ulrik Schroeder – WS 2010/11   Creative Commons Attribution 3.0 License
Overview
       Motivation
       Netflix Prize Competition
       Collaborative filtering approaches
       Content-based techniques
       Hybrid recommenders
       Summary




    2                           Web Technologies
We live in information overload!



“We are leaving the age of Information and entering the Age of Recommendation” -
                           The Long Tail (Chris Anderson)




 3                              Web Technologies
They try to attract you!




       Netflix: 2/3 of the movies rented were recommended
       Google News: 38% more click-throughs
       Amazon: 35% sales from recommendations


    4                         Web Technologies
Why recommenders?
       Enhance e-commerce and boost sales
           Browsers into buyers

       Recommender vs. Search:
           Discover the items you are looking, match your preferences
           Limited list of results

       Personalize your website content to the profile of an
        individual user
           Discover interesting items
           Automated personalization
           Increase usage and satisfaction


    5                                Web Technologies
Netflix Prize Competition
       $1.000.000 - if you “only” improve existing system by 10%!
       Contest started in 2006
       Annual progress prize $ 50.000
       Gained great popularity in
        academic circles

       The Winner
           BellKor´s Pragmatic Chaos
               10.5% improvement in July 2009




    6                                   Web Technologies
Recommender System = ?
       Definition:
           Algorithms/Systems for information filtering attempting to
            recommend certain items the user might like


       Items:
           Advertising messages, Investment choices, Restaurants, Cafes,
            Music tracks, Movies, TV programs, Books, Cloths, Supermarket
            goods, Tags, News articles, Online mates, Research papers




    7                               Web Technologies
User Profiling
       Understand people´s needs and interests
       Explicit Data Collection
           Ask for rating of items
           Rank a set of items
           Ask for detailed information/feedback
               CON: not well received by users, not ubiquitous
       Implicit Data Collection
           Purchasing history
           Items viewed
           Navigational patterns
           Obtain list of watched/listened items
           Analyze social data
               CON: Privacy concerns

    8                                    Web Technologies
Technology overview

                              RECOMMENDERS




    Collaborative filtering    Content-based                Hybrid
            (CF)                Filtering (CB)          recommenders




       Memory-based                     Model-based
       CF Algorithms                    CF Algorithms




9                             Web Technologies
Collaborative filtering (CF)
                                 RECOMMENDERS


              Collaborative          Content-based              Hybrid
              filtering (CF)          Filtering (CB)        recommenders



           Memory-based                                     Model-based
           CF Algorithms                                    CF Algorithms

• prediction based on past ratings              • learn a model from user’s ratings

• compute      similarities    between          • use the model to predict the
users/items                                     probabilistic rating of the active
                                                user on given item
• make prediction according to the
calculated weight (similarity)

10                                Web Technologies
Memory-based CF Algorithms

                      RECOMMENDERS



     Collaborative     Content-based         Hybrid
     filtering (CF)     Filtering (CB)   recommenders




     Memory-based                         Model-based
     CF Algorithms                        CF Algorithms




11                    Web Technologies
Memory-based CF Algorithms
    Entire or sample of the user-item matrix

Steps:
         1.   For the active user/item identify his neighbors
              Similarity computation
                 Pearson correlation
                 Vector cosine-based similarity


         2.   Neighborhood-based prediction/ Top-N
              Recommendation



    12                           Web Technologies
User-based vs. Item-based



       i1   i2   i3   i4   i5                               i1   i2   i3   i4   i5
  u1   5    8         7    8                           u1   5    8         7    8
  u2 10          1                                     u2   10        1
  u3   2         10   9    9                           u3   2         10   9    9
  u4        2    9    9    10                          u4        2    9    9    10
  u5   1    5              1                           u5   1    5              1
  ua   2         9    10                               ua   2         9    10

 User-based = You may like it                         Item-based = You may like it
because your “friends” liked it                      because you like similar items

13                                Web Technologies
Model-based CF Algorithms

                       RECOMMENDERS



      Collaborative     Content-based         Hybrid
      filtering (CF)     Filtering (CB)   recommenders




     Memory-based                          Model-based
     CF Algorithms                         CF Algorithms




14                     Web Technologies
Model-Based CF Algorithms
                                                                         R
                r5                                                       E
                           r9                                            C
          r11                                                            O
                      r3                                     r3          M
                                r4                                       M
     r7                              Train         r7             r4     E
                     r8                                     r8           N
                                                                         D
          r2                    r1                                       A
                                                                         T
                     r6                                                  I
                                                       MODEL             O
                                                                         N
           all ratings                           (only set of ratings)


   Train your system to recognize complex patterns in user-
    item data (ratings)
   Make the recommendation based on the trained model
   Relies on machine learning and data mining algorithms

    15                                   Web Technologies
Limitations and problems of CF
    Depend on human ratings
    Data sparsity
        Cold start , New user and New item problem
    Scalability
    Synonymy
    Shilling attacks
    Gray/Black sheep




    16                          Web Technologies
Content-based recommenders

                               RECOMMENDERS




     Collaborative filtering    Content-based                Hybrid
             (CF)                Filtering (CB)          recommenders




        Memory-based                     Model-based
        CF Algorithms                    CF Algorithms




17                             Web Technologies
Content-based recommendation (CB)
    For items containing textual information (keywords)
    Information Retrieval
    Compares similarity of the features of given items

    Example: Movie recommendation application
        Analyze common features among the movies
        Recommend only the movies that have a high degree of similarity
         to whatever the user’s preferences are


                        Small                           Large
                      Similarity                      SImilarity



    18                             Web Technologies
Limitations and problems of CB
   Limited content analysis
       Explicitly associated features
       Multimedia data – relies on tagging
       Same set of features – indistinguishable


   Overspecialization
      Difficult to recognize synonyms, concepts, or new emergi
       ng words

   New user Problem



19                          Web Technologies
Hybrid recommenders

                               RECOMMENDERS




     Collaborative filtering    Content-based                Hybrid
             (CF)                Filtering (CB)          recommenders




        Memory-based                     Model-based
        CF Algorithms                    CF Algorithms




20                             Web Technologies
Hybrid recommenders
    Use combination of CF and CB
         Implementing methods separately and combining their predictions
         Incorporating CB characteristics into a CF approach or vice versa
         Constructing a general unifying model that incorporates both



    Example: content-boosted collaborative filtering
           i1   i2   i3   i4                    i1   i2   i3   i4
                               Content                              Collaborative
     u1    5    8    x    7                u1   5    8    7    7
                               predictor                              filtering
     u2   10    x    1    x                u2   10   4    1    8
                                                                                    RECOMMENDATION
     u3    2    x    10   9                u3   2    5    10   9
     u4    x    2    9    9                u4   6    2    9    9
     ua    2    x    9    10               ua   2    3    9    10



    21                                     Web Technologies
Pros/Cons of Hybrid Recommenders
    Advantages
        Address limitations of pure CF or CB systems
        Provide more accurate recommendations
        Performance improvement
        Overcome sparsity


    Disadvatages
        Comlexity
        Expensive to build




    22                           Web Technologies
The winning solution on Netflix Contest
    A blend of several complex
     algorithms into a hybrid
     recommender system



    Main improvement:
        Incorporate temporal effects that cause movie and user biases as
         well as the changing user preferences




    23                           Web Technologies
Summary
                        Techniques                       Advantages                  Limitations
                Memory-based algorithms: • easy implementation                 •data sparsity
                 Neighborhood-based CF • no content considered                •cold start problem
Collaborative




                 Top-N recommendation                                         •limited scalability
                Model-based algorithms:      • deal better with sparsity,      • expensive modeling
                 machine learning / data    scalability                       • trade-off between
                mining algorithms            • intuitive rationale             performance and
                                                                               scalability
                 Information retrieval      • no data about other users       • limited content
Content-based




                                             • recommendation for              analysis
                                             new/unpopular items               • overspecialization
                                             • predictions for users with      •new user problem
                                             unique tastes
                 combination of             • overcome limitations of pure    • complexity
                collaborative and content-   collaborative and content-based   • expensive to build
Hybrids




                based approaches             recommendations
                                             • more accurate
                                             recommendations
                                             • performance improvement
        24                                       Web Technologies
Literature
    Adomavicius, G., Tuzhilin, A. 2005. Toward the Next Generation of
     Recommender Systems: A Survey of the State-of-the-Art and Possible
     Extensions.
    Su, X., Khoshgoftaar, T. 2009 A Survey on Collaborative Filtering Techniques.
    Sarwar, B., Karypis, G., Konstan, J., Riedl, J. 2001 Item-based collaborative
     Filtering Recommendation Algorithms.
    Das, A., Datar, M., Garg, A. 2007 Google News Personalization: Scalable
     Online Collaborative Fitlering.
    Linden, G., Smith, B., York, J. 2003 Amazon.com Recommendations Item-to-
     Item Collaborative Filtering.
    Guy, I., Zwerdling, N., Ronen, I., Carmel, D., Erel, U. 2010 Social Media
     Recommendation based on People and Tags.
    Schafer, J., Konstan, J., Riedl, J. 1999 Recommender Systems in E-Commerce.
    http://www.irelaxa.com/Geecat/2010/09/16/recommendation-system-
     collaborative-filtering/
    Piotte, M., Chabbert, M. 2009 Extending the toolbox.


25                                Web Technologies

More Related Content

Webtech recommender systems_presentation

  • 1. Recommender Systems Simona Dakova The slides are licensed under a 1 Web Technologies – Prof. Dr. Ulrik Schroeder – WS 2010/11 Creative Commons Attribution 3.0 License
  • 2. Overview  Motivation  Netflix Prize Competition  Collaborative filtering approaches  Content-based techniques  Hybrid recommenders  Summary 2 Web Technologies
  • 3. We live in information overload! “We are leaving the age of Information and entering the Age of Recommendation” - The Long Tail (Chris Anderson) 3 Web Technologies
  • 4. They try to attract you!  Netflix: 2/3 of the movies rented were recommended  Google News: 38% more click-throughs  Amazon: 35% sales from recommendations 4 Web Technologies
  • 5. Why recommenders?  Enhance e-commerce and boost sales  Browsers into buyers  Recommender vs. Search:  Discover the items you are looking, match your preferences  Limited list of results  Personalize your website content to the profile of an individual user  Discover interesting items  Automated personalization  Increase usage and satisfaction 5 Web Technologies
  • 6. Netflix Prize Competition  $1.000.000 - if you “only” improve existing system by 10%!  Contest started in 2006  Annual progress prize $ 50.000  Gained great popularity in academic circles  The Winner  BellKor´s Pragmatic Chaos  10.5% improvement in July 2009 6 Web Technologies
  • 7. Recommender System = ?  Definition:  Algorithms/Systems for information filtering attempting to recommend certain items the user might like  Items:  Advertising messages, Investment choices, Restaurants, Cafes, Music tracks, Movies, TV programs, Books, Cloths, Supermarket goods, Tags, News articles, Online mates, Research papers 7 Web Technologies
  • 8. User Profiling  Understand people´s needs and interests  Explicit Data Collection  Ask for rating of items  Rank a set of items  Ask for detailed information/feedback  CON: not well received by users, not ubiquitous  Implicit Data Collection  Purchasing history  Items viewed  Navigational patterns  Obtain list of watched/listened items  Analyze social data  CON: Privacy concerns 8 Web Technologies
  • 9. Technology overview RECOMMENDERS Collaborative filtering Content-based Hybrid (CF) Filtering (CB) recommenders Memory-based Model-based CF Algorithms CF Algorithms 9 Web Technologies
  • 10. Collaborative filtering (CF) RECOMMENDERS Collaborative Content-based Hybrid filtering (CF) Filtering (CB) recommenders Memory-based Model-based CF Algorithms CF Algorithms • prediction based on past ratings • learn a model from user’s ratings • compute similarities between • use the model to predict the users/items probabilistic rating of the active user on given item • make prediction according to the calculated weight (similarity) 10 Web Technologies
  • 11. Memory-based CF Algorithms RECOMMENDERS Collaborative Content-based Hybrid filtering (CF) Filtering (CB) recommenders Memory-based Model-based CF Algorithms CF Algorithms 11 Web Technologies
  • 12. Memory-based CF Algorithms  Entire or sample of the user-item matrix Steps: 1. For the active user/item identify his neighbors Similarity computation Pearson correlation Vector cosine-based similarity 2. Neighborhood-based prediction/ Top-N Recommendation 12 Web Technologies
  • 13. User-based vs. Item-based i1 i2 i3 i4 i5 i1 i2 i3 i4 i5 u1 5 8 7 8 u1 5 8 7 8 u2 10 1 u2 10 1 u3 2 10 9 9 u3 2 10 9 9 u4 2 9 9 10 u4 2 9 9 10 u5 1 5 1 u5 1 5 1 ua 2 9 10 ua 2 9 10 User-based = You may like it Item-based = You may like it because your “friends” liked it because you like similar items 13 Web Technologies
  • 14. Model-based CF Algorithms RECOMMENDERS Collaborative Content-based Hybrid filtering (CF) Filtering (CB) recommenders Memory-based Model-based CF Algorithms CF Algorithms 14 Web Technologies
  • 15. Model-Based CF Algorithms R r5 E r9 C r11 O r3 r3 M r4 M r7 Train r7 r4 E r8 r8 N D r2 r1 A T r6 I MODEL O N all ratings (only set of ratings)  Train your system to recognize complex patterns in user- item data (ratings)  Make the recommendation based on the trained model  Relies on machine learning and data mining algorithms 15 Web Technologies
  • 16. Limitations and problems of CF  Depend on human ratings  Data sparsity  Cold start , New user and New item problem  Scalability  Synonymy  Shilling attacks  Gray/Black sheep 16 Web Technologies
  • 17. Content-based recommenders RECOMMENDERS Collaborative filtering Content-based Hybrid (CF) Filtering (CB) recommenders Memory-based Model-based CF Algorithms CF Algorithms 17 Web Technologies
  • 18. Content-based recommendation (CB)  For items containing textual information (keywords)  Information Retrieval  Compares similarity of the features of given items  Example: Movie recommendation application  Analyze common features among the movies  Recommend only the movies that have a high degree of similarity to whatever the user’s preferences are Small Large Similarity SImilarity 18 Web Technologies
  • 19. Limitations and problems of CB  Limited content analysis  Explicitly associated features  Multimedia data – relies on tagging  Same set of features – indistinguishable  Overspecialization  Difficult to recognize synonyms, concepts, or new emergi ng words  New user Problem 19 Web Technologies
  • 20. Hybrid recommenders RECOMMENDERS Collaborative filtering Content-based Hybrid (CF) Filtering (CB) recommenders Memory-based Model-based CF Algorithms CF Algorithms 20 Web Technologies
  • 21. Hybrid recommenders  Use combination of CF and CB  Implementing methods separately and combining their predictions  Incorporating CB characteristics into a CF approach or vice versa  Constructing a general unifying model that incorporates both  Example: content-boosted collaborative filtering i1 i2 i3 i4 i1 i2 i3 i4 Content Collaborative u1 5 8 x 7 u1 5 8 7 7 predictor filtering u2 10 x 1 x u2 10 4 1 8 RECOMMENDATION u3 2 x 10 9 u3 2 5 10 9 u4 x 2 9 9 u4 6 2 9 9 ua 2 x 9 10 ua 2 3 9 10 21 Web Technologies
  • 22. Pros/Cons of Hybrid Recommenders  Advantages  Address limitations of pure CF or CB systems  Provide more accurate recommendations  Performance improvement  Overcome sparsity  Disadvatages  Comlexity  Expensive to build 22 Web Technologies
  • 23. The winning solution on Netflix Contest  A blend of several complex algorithms into a hybrid recommender system  Main improvement:  Incorporate temporal effects that cause movie and user biases as well as the changing user preferences 23 Web Technologies
  • 24. Summary Techniques Advantages Limitations Memory-based algorithms: • easy implementation •data sparsity  Neighborhood-based CF • no content considered •cold start problem Collaborative  Top-N recommendation •limited scalability Model-based algorithms: • deal better with sparsity, • expensive modeling  machine learning / data scalability • trade-off between mining algorithms • intuitive rationale performance and scalability  Information retrieval • no data about other users • limited content Content-based • recommendation for analysis new/unpopular items • overspecialization • predictions for users with •new user problem unique tastes  combination of • overcome limitations of pure • complexity collaborative and content- collaborative and content-based • expensive to build Hybrids based approaches recommendations • more accurate recommendations • performance improvement 24 Web Technologies
  • 25. Literature  Adomavicius, G., Tuzhilin, A. 2005. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions.  Su, X., Khoshgoftaar, T. 2009 A Survey on Collaborative Filtering Techniques.  Sarwar, B., Karypis, G., Konstan, J., Riedl, J. 2001 Item-based collaborative Filtering Recommendation Algorithms.  Das, A., Datar, M., Garg, A. 2007 Google News Personalization: Scalable Online Collaborative Fitlering.  Linden, G., Smith, B., York, J. 2003 Amazon.com Recommendations Item-to- Item Collaborative Filtering.  Guy, I., Zwerdling, N., Ronen, I., Carmel, D., Erel, U. 2010 Social Media Recommendation based on People and Tags.  Schafer, J., Konstan, J., Riedl, J. 1999 Recommender Systems in E-Commerce.  http://www.irelaxa.com/Geecat/2010/09/16/recommendation-system- collaborative-filtering/  Piotte, M., Chabbert, M. 2009 Extending the toolbox. 25 Web Technologies