際際滷

際際滷Share a Scribd company logo
1
                                       ITWP2011, Barcelona




Using Social- and Pseudo-
Social Networks to Improve
Recommendation Quality
Alan Said, Ernesto W. De Luca, Sahin Albayrak
2




Abstract
 The accumulated amount of data in the digital universe
  reached 1.2 Zettabytes (1 billion terabytes) in 2010.
   50% increase since 2008.
 Websites increasingly accumulate a wider variety of data on
  their users
   Without necessarily using it


 This paper: how can this data be used to improve
  recommendation
3




Outline
 Introduction
   Recommender Systems
   Problem statement
 Dataset
   Statistics
   Social and Pseudo-Social networks
 Approach
 Results
4




Introduction
 IMDb, one of the first online recommender systems, turned
  20 on October 17th 2010.

 Ever since their beginning, recommender systems have,
  through relatively simple techniques, produced
  recommendations for their users

 Todays online systems contain more information about their
  users, we should use that information.
   Which information is important?
5




The Problem
 What to do with the heaps of information available?
      What and how to use in order to improve, or learn how to
       improve recommendations

  How should we treat
          Friendships?
          Comments?
          Idols?
          common interests?
  How important are these in terms of recommendation
   quality?
6




Dataset
 From the movie domain  Moviepilot.de
     Germanys largest movie recommendation community
     1M+ users
     13M ratings
     50K movies

 Subset used here
     10, 000 randomly selected users with minimum 30 ratings
     1.5M ratings
     50, 000 comments
     4, 000 friendships
     170, 000 idols
     25, 000 diggs
7




Social- and Pseudo-Social
networks
 Social networks
     Explicit statements of friendship between users

 Pseudo social networks
     Users commenting on the same movie
     Users being fans of the same people
     Users digging the same news articles, trailers, etc.

   38% of ratings performed by users with friends
   45% of ratings performed by users with comments
   77% of ratings performed by users who are fans
   29% of ratings performed by users who digg
8




The Approach
 Augmentig k-Nearest Neighbor neighborhoods by using
  information from (pseudo) social networks

   Using standard Pearson Similarity
    Increasing the similarity of users in the same networks in order to add
     them to the neighborhood
9




The Approach




    Standard neighborhood   Augmented neighborhood
10




Motivation
 Similarity metrics (Pearson, Jaccard, etc) are based on co-
  ratings
   Popular items often heighten similarities without adding value
    e.g. movies like The Matrix and The Lord of The Rings often
    have similar (high) ratings, even if users do not share taste
   Adding importance to users who share other interests filters out
    some of the effects of popular items.
11




Results
10

 9

 8

 7

 6

 5                                           MAP
                                             P@10
 4

 3

 2

 1

 0
     Friendships   Comments   Fans   Diggs
12




Conclusion
 Social and interaction (co-commenting, etc) networks seem
  to hold more information than standard CF is able to identify
 Similarity metrics do not always tell the complete truth

 ToDos:
   Find items that are important for establishing similarity between
    users
   Investigate what other information can be used for measuring
    similarities
13




Questions?


             Thank you!

More Related Content

Using Social- and Pseudo-Social Networks to Improve Recommendation Quality

  • 1. 1 ITWP2011, Barcelona Using Social- and Pseudo- Social Networks to Improve Recommendation Quality Alan Said, Ernesto W. De Luca, Sahin Albayrak
  • 2. 2 Abstract The accumulated amount of data in the digital universe reached 1.2 Zettabytes (1 billion terabytes) in 2010. 50% increase since 2008. Websites increasingly accumulate a wider variety of data on their users Without necessarily using it This paper: how can this data be used to improve recommendation
  • 3. 3 Outline Introduction Recommender Systems Problem statement Dataset Statistics Social and Pseudo-Social networks Approach Results
  • 4. 4 Introduction IMDb, one of the first online recommender systems, turned 20 on October 17th 2010. Ever since their beginning, recommender systems have, through relatively simple techniques, produced recommendations for their users Todays online systems contain more information about their users, we should use that information. Which information is important?
  • 5. 5 The Problem What to do with the heaps of information available? What and how to use in order to improve, or learn how to improve recommendations How should we treat Friendships? Comments? Idols? common interests? How important are these in terms of recommendation quality?
  • 6. 6 Dataset From the movie domain Moviepilot.de Germanys largest movie recommendation community 1M+ users 13M ratings 50K movies Subset used here 10, 000 randomly selected users with minimum 30 ratings 1.5M ratings 50, 000 comments 4, 000 friendships 170, 000 idols 25, 000 diggs
  • 7. 7 Social- and Pseudo-Social networks Social networks Explicit statements of friendship between users Pseudo social networks Users commenting on the same movie Users being fans of the same people Users digging the same news articles, trailers, etc. 38% of ratings performed by users with friends 45% of ratings performed by users with comments 77% of ratings performed by users who are fans 29% of ratings performed by users who digg
  • 8. 8 The Approach Augmentig k-Nearest Neighbor neighborhoods by using information from (pseudo) social networks Using standard Pearson Similarity Increasing the similarity of users in the same networks in order to add them to the neighborhood
  • 9. 9 The Approach Standard neighborhood Augmented neighborhood
  • 10. 10 Motivation Similarity metrics (Pearson, Jaccard, etc) are based on co- ratings Popular items often heighten similarities without adding value e.g. movies like The Matrix and The Lord of The Rings often have similar (high) ratings, even if users do not share taste Adding importance to users who share other interests filters out some of the effects of popular items.
  • 11. 11 Results 10 9 8 7 6 5 MAP P@10 4 3 2 1 0 Friendships Comments Fans Diggs
  • 12. 12 Conclusion Social and interaction (co-commenting, etc) networks seem to hold more information than standard CF is able to identify Similarity metrics do not always tell the complete truth ToDos: Find items that are important for establishing similarity between users Investigate what other information can be used for measuring similarities
  • 13. 13 Questions? Thank you!