Recommender Systems

                                                                 Simona Dakova

                                                                          The slides are licensed under a
Creative Commons Attribution 3.0 License
       Motivation
       Netflix Prize Competition
       Collaborative filtering approaches
       Content-based techniques
       Hybrid recommenders
       Summary

    Web Technologies
We live in information overload!

“We are leaving the age of Information and entering the Age of Recommendation” -
                           The Long Tail (Chris Anderson)

 Web Technologies
They try to attract you!

       Netflix: 2/3 of the movies rented were recommended
       Google News: 38% more click-throughs
       Amazon: 35% sales from recommendations

    Web Technologies
Why recommenders?
       Enhance e-commerce and boost sales
           Browsers into buyers

       Recommender vs. Search:
           Discover the items you are looking, match your preferences
           Limited list of results

       Personalize your website content to the profile of an
        individual user
           Discover interesting items
           Automated personalization
           Increase usage and satisfaction

    Web Technologies
Netflix Prize Competition
       $1.000.000 - if you “only” improve existing system by 10%!
       Contest started in 2006
       Annual progress prize $ 50.000
       Gained great popularity in
        academic circles

       The Winner
           BellKor´s Pragmatic Chaos
               10.5% improvement in July 2009

    Web Technologies
Recommender System = ?
       Definition:
           Algorithms/Systems for information filtering attempting to
            recommend certain items the user might like

       Items:
           Advertising messages, Investment choices, Restaurants, Cafes,
            Music tracks, Movies, TV programs, Books, Cloths, Supermarket
            goods, Tags, News articles, Online mates, Research papers

    Web Technologies
User Profiling
       Understand people´s needs and interests
       Explicit Data Collection
           Ask for rating of items
           Rank a set of items
           Ask for detailed information/feedback
               CON: not well received by users, not ubiquitous
       Implicit Data Collection
           Purchasing history
           Items viewed
           Navigational patterns
           Obtain list of watched/listened items
           Analyze social data
               CON: Privacy concerns

    Web Technologies
Technology overview


    Collaborative filtering    Content-based                Hybrid
            (CF)                Filtering (CB)          recommenders

       Memory-based                     Model-based
       CF Algorithms                    CF Algorithms

Web Technologies
Collaborative filtering (CF)

              Collaborative          Content-based              Hybrid
              filtering (CF)          Filtering (CB)        recommenders

           Memory-based                                     Model-based
           CF Algorithms                                    CF Algorithms

• prediction based on past ratings              • learn a model from user’s ratings

• compute      similarities    between          • use the model to predict the
users/items                                     probabilistic rating of the active
                                                user on given item
• make prediction according to the
calculated weight (similarity)

Web Technologies
Memory-based CF Algorithms


     Collaborative     Content-based         Hybrid
     filtering (CF)     Filtering (CB)   recommenders

     Memory-based                         Model-based
     CF Algorithms                        CF Algorithms

Web Technologies
Memory-based CF Algorithms
    Entire or sample of the user-item matrix

         1.   For the active user/item identify his neighbors
              Similarity computation
                 Pearson correlation
                 Vector cosine-based similarity

         2.   Neighborhood-based prediction/ Top-N

    Web Technologies
User-based vs. Item-based

       i1   i2   i3   i4   i5                               i1   i2   i3   i4   i5
  u1   5    8         7    8                           u1   5    8         7    8
  u2 10          1                                     u2   10        1
  u3   2         10   9    9                           u3   2         10   9    9
  u4        2    9    9    10                          u4        2    9    9    10
  u5   1    5              1                           u5   1    5              1
  ua   2         9    10                               ua   2         9    10

 User-based = You may like it                         Item-based = You may like it
because your “friends” liked it                      because you like similar items

Web Technologies
Model-based CF Algorithms


      Collaborative     Content-based         Hybrid
      filtering (CF)     Filtering (CB)   recommenders

     Memory-based                          Model-based
     CF Algorithms                         CF Algorithms

Web Technologies
Model-Based CF Algorithms
                r5                                                       E
                           r9                                            C
          r11                                                            O
                      r3                                     r3          M
                                r4                                       M
     r7                              Train         r7             r4     E
                     r8                                     r8           N
          r2                    r1                                       A
                     r6                                                  I
                                                       MODEL             O
           all ratings                           (only set of ratings)

   Train your system to recognize complex patterns in user-
    item data (ratings)
   Make the recommendation based on the trained model
   Relies on machine learning and data mining algorithms

    Web Technologies
Limitations and problems of CF
    Depend on human ratings
    Data sparsity
        Cold start , New user and New item problem
    Scalability
    Synonymy
    Shilling attacks
    Gray/Black sheep

    Web Technologies
Content-based recommenders


     Collaborative filtering    Content-based                Hybrid
             (CF)                Filtering (CB)          recommenders

        Memory-based                     Model-based
        CF Algorithms                    CF Algorithms

Web Technologies
Content-based recommendation (CB)
    For items containing textual information (keywords)
    Information Retrieval
    Compares similarity of the features of given items

    Example: Movie recommendation application
        Analyze common features among the movies
        Recommend only the movies that have a high degree of similarity
         to whatever the user’s preferences are

                        Small                           Large
                      Similarity                      SImilarity

    Web Technologies
Limitations and problems of CB
   Limited content analysis
       Explicitly associated features
       Multimedia data – relies on tagging
       Same set of features – indistinguishable

   Overspecialization
      Difficult to recognize synonyms, concepts, or new emergi
       ng words

   New user Problem

Web Technologies
Hybrid recommenders


     Collaborative filtering    Content-based                Hybrid
             (CF)                Filtering (CB)          recommenders

        Memory-based                     Model-based
        CF Algorithms                    CF Algorithms

Web Technologies
Hybrid recommenders
    Use combination of CF and CB
         Implementing methods separately and combining their predictions
         Incorporating CB characteristics into a CF approach or vice versa
         Constructing a general unifying model that incorporates both

    Example: content-boosted collaborative filtering
           i1   i2   i3   i4                    i1   i2   i3   i4
                               Content                              Collaborative
     u1    5    8    x    7                u1   5    8    7    7
                               predictor                              filtering
     u2   10    x    1    x                u2   10   4    1    8
     u3    2    x    10   9                u3   2    5    10   9
     u4    x    2    9    9                u4   6    2    9    9
     ua    2    x    9    10               ua   2    3    9    10

    Web Technologies
Pros/Cons of Hybrid Recommenders
    Advantages
        Address limitations of pure CF or CB systems
        Provide more accurate recommendations
        Performance improvement
        Overcome sparsity

    Disadvatages
        Comlexity
        Expensive to build

    Web Technologies
The winning solution on Netflix Contest
    A blend of several complex
     algorithms into a hybrid
     recommender system

    Main improvement:
        Incorporate temporal effects that cause movie and user biases as
         well as the changing user preferences

    Web Technologies
                        Techniques                       Advantages                  Limitations
                Memory-based algorithms: • easy implementation                 •data sparsity
                 Neighborhood-based CF • no content considered                •cold start problem

                 Top-N recommendation                                         •limited scalability
                Model-based algorithms:      • deal better with sparsity,      • expensive modeling
                 machine learning / data    scalability                       • trade-off between
                mining algorithms            • intuitive rationale             performance and
                 Information retrieval      • no data about other users       • limited content

                                             • recommendation for              analysis
                                             new/unpopular items               • overspecialization
                                             • predictions for users with      •new user problem
                                             unique tastes
                 combination of             • overcome limitations of pure    • complexity
                collaborative and content-   collaborative and content-based   • expensive to build

                based approaches             recommendations
                                             • more accurate
                                             • performance improvement
        Web Technologies
    Adomavicius, G., Tuzhilin, A. 2005. Toward the Next Generation of
     Recommender Systems: A Survey of the State-of-the-Art and Possible
    Su, X., Khoshgoftaar, T. 2009 A Survey on Collaborative Filtering Techniques.
    Sarwar, B., Karypis, G., Konstan, J., Riedl, J. 2001 Item-based collaborative
     Filtering Recommendation Algorithms.
    Das, A., Datar, M., Garg, A. 2007 Google News Personalization: Scalable
     Online Collaborative Fitlering.
    Linden, G., Smith, B., York, J. 2003 Amazon.com Recommendations Item-to-
     Item Collaborative Filtering.
    Guy, I., Zwerdling, N., Ronen, I., Carmel, D., Erel, U. 2010 Social Media
     Recommendation based on People and Tags.
    Schafer, J., Konstan, J., Riedl, J. 1999 Recommender Systems in E-Commerce.
    http://www.irelaxa.com/Geecat/2010/09/16/recommendation-system-
    Piotte, M., Chabbert, M. 2009 Extending the toolbox.

Web Technologies

