際際滷

際際滷Share a Scribd company logo
Extending recommendation systems
      with semantics and context-awareness


                                        CCIA 2011

                          Victor Codina        & Luigi Ceccaroni
                         vcodina@lsi.upc.edu        lceccaroni@BDigital.org



Departament de Llenguatges i Sistemes Informtics                 Health Informatics
Knowledge Engineering and Machine Learning Group         Personalized Computational Medicine
Outline


 Traditional vs. Contextual recommendation

 State-of-the-art & Current limitations

 Research question

 Semantics acquisition & exploitation

 Proposed model

 Experimental evaluation

 Conclusions & Future work



               Extending Recommendation Systems with Semantics and Context-Awareness   2
Traditional recommendation problem

 Regression problem:
   o Given a pair (u  U, i  I), predict items degree of utility (                     )
 Estimation based only on user and item information

                                 preferences (u)


    Preference            Collaborative filtering (CF)
                             Content-based (CB)
                                    Hybrid
                           Recommendation model                         attributes (i)
      Matrix                    recommender




                    Extending Recommendation Systems with Semantics and Context-Awareness    3
Context-aware recommendation problem

 Context as additional dimension for estimation
   o Given a tuple (u, i, c), predict items degree of utility in context c
   o Context = situated action
                                                               Training data
Representational view:
                                             c                  Pre-filtering


                                             c           Recommendation model
                                                         Multi-Dimensional (MD)

Example:
   c = (winter, cold)                        c                 Post-filtering
  c1 = Season c2 = Temperature



                      Extending Recommendation Systems with Semantics and Context-Awareness   4
State-of-the-art & limitations

 Adaptations of latent-factor models (MD paradigm)
 Examples:
   o N-dimensional Tensor Factorization
   o Bias-based Matrix Factorization with temporal dynamics
 Best prediction accuracy results on recent competitions
   o E.g.: Netflix challenge (2009), Yahoo! Labs KDD Cup (2011)
 Main limitations of latent-factor models:
  o Lack of transparency in explaining recommendations
  o Low cold-start performance (users and items with few ratings)
  o Lack of novelty and diversity of recommendations



                   Extending Recommendation Systems with Semantics and Context-Awareness   5
Research questions & main assumptions

 Research questions
   o Q1. Can we overcome the limitations and improve global
     recommendation quality (not only prediction accuracy) by
     exploiting domain and context knowledge?

   o Q2. Under which conditions is this improvement maximized?

 Main assumptions
   o There exists semantic relationships among entities of the
     recommendation space (users, items, contexts)

   o The adequate exploitation of these semantic relationships is
     useful to overcome current limitations


                  Extending Recommendation Systems with Semantics and Context-Awareness   6
Knowledge acquisition and representation

Domain/Context                   Concept                               Concept
   concepts                         x                                     y
                                                     S(x,y)?


                               Explicit similarity          Implicit similarity

                          Ontology-based                           Statistics-based
Similarity measure   - Edge-based (LCA)                     - Probabilistic measures (PMI)
                     - Node-based (MICA)                    - Dimensionality reduction (LSA)
                     - Logic-based                          - Graph-based (SimRank)

                                  uses                                     uses


                              Ontologies                            Data collections
Knowledge source                                                   - Folksonomies
                         - Taxonomies (ODP)
                         - Thesauri (Wordnet)                      - Item descriptions


                     Extending Recommendation Systems with Semantics and Context-Awareness     7
User/Item representation

 Concept-based modeling (weighted overlay approach)

                                     Domain knowledge
                                   (concepts = item attributes)

              User u                   d2            d4                Item i
                 Pu                                                         Pi
                                         d1            d3




    (Degree of interest in d1)                                           (Relevance of d3)



    Interest inferring method                               Attribute weighting method
  - Explicit feedback (Rating avg)                          - Structured content (IDF)
  - Implicit feedback (Seen frequency)                      - Unstructured (TFIDF, tagshare)




                         Extending Recommendation Systems with Semantics and Context-Awareness   8
Knowledge exploitation

Knowledge
                           Can be used for                         Possible benefits
   type
             Measuring the semantic matching among                Less rigid contextual
Contextual   different context states                             filtering than using
                                                                  exact matching
             Applying semantic inference methods over             Enrich item/user
             user/item concept-based profiles:                    profiles with new
                - Spreading activation                            concepts
                - Reasoning based on DLs                          semantically related
 Domain-
  based      Measuring the matching between two                   More precise
             user/item using various semantic matching            similarity
             strategies:                                          measurements that
                - Pairwise (Best-pairs or All-pairs)              using traditional
                - Groupwise (set-, vector- or graph-based)        measures



                    Extending Recommendation Systems with Semantics and Context-Awareness   9
Case of study: a MD semantically-enhanced CB

 Contextual prediction model (bias-based):


      Overall     Contextual Contextual
     rating avg    User bias  Item bias          All-pairs Item-User semantic matching

   where:
                         Session bias of (u,d)     contextual bias of (u,d)

 Stochastic gradient descent for model training:




                      Extending Recommendation Systems with Semantics and Context-Awareness   10
MovieLens Dataset

 Contextual concepts without semantics
   o 3 contextual factors (season, time of the day, weekend?)
 Domain concepts with implicit semantics
   o Set of pre-selected tags + set of genres
   o Semantic relationships among tags acquired from folksonomy
 Original dataset pruned by selecting only items with a
  certain amount of pre-selected tags




                  Extending Recommendation Systems with Semantics and Context-Awareness   11
Offline experiment

 Last ratings (according to timestamp) testing
   o In this way we simulate future predictions for each user
 5-fold cross validation
 Two recommendation tasks evaluated
   o Rating prediction (RMSE) and Top-10 recommendation (Recall)
 Threshold-based cold-start performance evaluation
   o User profile size < 25 ratings: 10% of users
 Performance comparison of the proposed model with:
   o 3 model variants
   o 5 baseline models
   o 1 model based on matrix factorization

                   Extending Recommendation Systems with Semantics and Context-Awareness   12
Results

 Paired t test significance among 4 model variants:
        o    Model 1 Static-CB (static bias + traditional Item-User matching)
        o    Model 2 Static-SemCB (static bias + All-pairs matching)
        o    Model 3 Contextual-CB (contextual bias + traditional matching)
        o    Model 4 Contextual-SemCB (contextual bias + All-pairs matching)
                        Global RMSE                                                         Cold-Start RMSE
0,851                                                                     0,919



0,844                                                                     0,918



0,837                                                                     0,917
                      0,05             0,001             0,17                               0,05             0,62             0,01


 0,83                                                                     0,916
            Model 1          Model 2           Model 3          Model 4           Model 1          Model 2          Model 3          Model 4

                                                     (P-values in red)
                      E.g. P-value = 0,05 means that there is a 95% chance of being a real difference

                                         Extending Recommendation Systems with Semantics and Context-Awareness                                 13
Conclusions

 Context-awareness improves prediction accuracy for
  users with a certain number of ratings (non cold-start)
   o 25+ rating: 90% of users
 Semantics slightly improves cold-start performance
 The knowledge acquisition method for the MovieLens
  folksonomy may be not adequate: limited domain
  knowledge
 MovieLens users rate several movies at once and not just
  after seeing the movie
   o Rating-session--specific effects have a major influence in the
     user ratings: distorted contextual information


                  Extending Recommendation Systems with Semantics and Context-Awareness   14
Future work

 Extending evaluation of the proposed CB model:
   o Using datasets from other domains (e.g. music, tourism, health)
   o Experimenting with other sources of knowledge (e.g. Amazon
     movie taxonomy)
   o Experimenting with other methods for semantics exploitation
   o Evaluating other properties (e.g. diversity, novelty, coverage)


 Extending CF models with the proposed semantic
  approach:
   o Neighborhood-based
   o Matrix Factorization


                  Extending Recommendation Systems with Semantics and Context-Awareness   15
Extending recommendation systems
      with semantics and context-awareness


                                        CCIA 2011

                          Victor Codina        & Luigi Ceccaroni
                         vcodina@lsi.upc.edu        lceccaroni@BDigital.org



Departament de Llenguatges i Sistemes Informtics                 Health Informatics
Knowledge Engineering and Machine Learning Group         Personalized Computational Medicine
Backup slides




Extending Recommendation Systems with Semantics and Context-Awareness   17
Prediction models of all variants

 Model 1 (Static-CB):




 Model 2 (Static-SemCB):




 Model 3 (Contextual-CB):




 Model 4 (Contextual-SemCB):




                    Extending Recommendation Systems with Semantics and Context-Awareness   18

More Related Content

Extending Recommendation Systems With Semantics And Context Awareness

  • 1. Extending recommendation systems with semantics and context-awareness CCIA 2011 Victor Codina & Luigi Ceccaroni vcodina@lsi.upc.edu lceccaroni@BDigital.org Departament de Llenguatges i Sistemes Informtics Health Informatics Knowledge Engineering and Machine Learning Group Personalized Computational Medicine
  • 2. Outline Traditional vs. Contextual recommendation State-of-the-art & Current limitations Research question Semantics acquisition & exploitation Proposed model Experimental evaluation Conclusions & Future work Extending Recommendation Systems with Semantics and Context-Awareness 2
  • 3. Traditional recommendation problem Regression problem: o Given a pair (u U, i I), predict items degree of utility ( ) Estimation based only on user and item information preferences (u) Preference Collaborative filtering (CF) Content-based (CB) Hybrid Recommendation model attributes (i) Matrix recommender Extending Recommendation Systems with Semantics and Context-Awareness 3
  • 4. Context-aware recommendation problem Context as additional dimension for estimation o Given a tuple (u, i, c), predict items degree of utility in context c o Context = situated action Training data Representational view: c Pre-filtering c Recommendation model Multi-Dimensional (MD) Example: c = (winter, cold) c Post-filtering c1 = Season c2 = Temperature Extending Recommendation Systems with Semantics and Context-Awareness 4
  • 5. State-of-the-art & limitations Adaptations of latent-factor models (MD paradigm) Examples: o N-dimensional Tensor Factorization o Bias-based Matrix Factorization with temporal dynamics Best prediction accuracy results on recent competitions o E.g.: Netflix challenge (2009), Yahoo! Labs KDD Cup (2011) Main limitations of latent-factor models: o Lack of transparency in explaining recommendations o Low cold-start performance (users and items with few ratings) o Lack of novelty and diversity of recommendations Extending Recommendation Systems with Semantics and Context-Awareness 5
  • 6. Research questions & main assumptions Research questions o Q1. Can we overcome the limitations and improve global recommendation quality (not only prediction accuracy) by exploiting domain and context knowledge? o Q2. Under which conditions is this improvement maximized? Main assumptions o There exists semantic relationships among entities of the recommendation space (users, items, contexts) o The adequate exploitation of these semantic relationships is useful to overcome current limitations Extending Recommendation Systems with Semantics and Context-Awareness 6
  • 7. Knowledge acquisition and representation Domain/Context Concept Concept concepts x y S(x,y)? Explicit similarity Implicit similarity Ontology-based Statistics-based Similarity measure - Edge-based (LCA) - Probabilistic measures (PMI) - Node-based (MICA) - Dimensionality reduction (LSA) - Logic-based - Graph-based (SimRank) uses uses Ontologies Data collections Knowledge source - Folksonomies - Taxonomies (ODP) - Thesauri (Wordnet) - Item descriptions Extending Recommendation Systems with Semantics and Context-Awareness 7
  • 8. User/Item representation Concept-based modeling (weighted overlay approach) Domain knowledge (concepts = item attributes) User u d2 d4 Item i Pu Pi d1 d3 (Degree of interest in d1) (Relevance of d3) Interest inferring method Attribute weighting method - Explicit feedback (Rating avg) - Structured content (IDF) - Implicit feedback (Seen frequency) - Unstructured (TFIDF, tagshare) Extending Recommendation Systems with Semantics and Context-Awareness 8
  • 9. Knowledge exploitation Knowledge Can be used for Possible benefits type Measuring the semantic matching among Less rigid contextual Contextual different context states filtering than using exact matching Applying semantic inference methods over Enrich item/user user/item concept-based profiles: profiles with new - Spreading activation concepts - Reasoning based on DLs semantically related Domain- based Measuring the matching between two More precise user/item using various semantic matching similarity strategies: measurements that - Pairwise (Best-pairs or All-pairs) using traditional - Groupwise (set-, vector- or graph-based) measures Extending Recommendation Systems with Semantics and Context-Awareness 9
  • 10. Case of study: a MD semantically-enhanced CB Contextual prediction model (bias-based): Overall Contextual Contextual rating avg User bias Item bias All-pairs Item-User semantic matching where: Session bias of (u,d) contextual bias of (u,d) Stochastic gradient descent for model training: Extending Recommendation Systems with Semantics and Context-Awareness 10
  • 11. MovieLens Dataset Contextual concepts without semantics o 3 contextual factors (season, time of the day, weekend?) Domain concepts with implicit semantics o Set of pre-selected tags + set of genres o Semantic relationships among tags acquired from folksonomy Original dataset pruned by selecting only items with a certain amount of pre-selected tags Extending Recommendation Systems with Semantics and Context-Awareness 11
  • 12. Offline experiment Last ratings (according to timestamp) testing o In this way we simulate future predictions for each user 5-fold cross validation Two recommendation tasks evaluated o Rating prediction (RMSE) and Top-10 recommendation (Recall) Threshold-based cold-start performance evaluation o User profile size < 25 ratings: 10% of users Performance comparison of the proposed model with: o 3 model variants o 5 baseline models o 1 model based on matrix factorization Extending Recommendation Systems with Semantics and Context-Awareness 12
  • 13. Results Paired t test significance among 4 model variants: o Model 1 Static-CB (static bias + traditional Item-User matching) o Model 2 Static-SemCB (static bias + All-pairs matching) o Model 3 Contextual-CB (contextual bias + traditional matching) o Model 4 Contextual-SemCB (contextual bias + All-pairs matching) Global RMSE Cold-Start RMSE 0,851 0,919 0,844 0,918 0,837 0,917 0,05 0,001 0,17 0,05 0,62 0,01 0,83 0,916 Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 Model 3 Model 4 (P-values in red) E.g. P-value = 0,05 means that there is a 95% chance of being a real difference Extending Recommendation Systems with Semantics and Context-Awareness 13
  • 14. Conclusions Context-awareness improves prediction accuracy for users with a certain number of ratings (non cold-start) o 25+ rating: 90% of users Semantics slightly improves cold-start performance The knowledge acquisition method for the MovieLens folksonomy may be not adequate: limited domain knowledge MovieLens users rate several movies at once and not just after seeing the movie o Rating-session--specific effects have a major influence in the user ratings: distorted contextual information Extending Recommendation Systems with Semantics and Context-Awareness 14
  • 15. Future work Extending evaluation of the proposed CB model: o Using datasets from other domains (e.g. music, tourism, health) o Experimenting with other sources of knowledge (e.g. Amazon movie taxonomy) o Experimenting with other methods for semantics exploitation o Evaluating other properties (e.g. diversity, novelty, coverage) Extending CF models with the proposed semantic approach: o Neighborhood-based o Matrix Factorization Extending Recommendation Systems with Semantics and Context-Awareness 15
  • 16. Extending recommendation systems with semantics and context-awareness CCIA 2011 Victor Codina & Luigi Ceccaroni vcodina@lsi.upc.edu lceccaroni@BDigital.org Departament de Llenguatges i Sistemes Informtics Health Informatics Knowledge Engineering and Machine Learning Group Personalized Computational Medicine
  • 17. Backup slides Extending Recommendation Systems with Semantics and Context-Awareness 17
  • 18. Prediction models of all variants Model 1 (Static-CB): Model 2 (Static-SemCB): Model 3 (Contextual-CB): Model 4 (Contextual-SemCB): Extending Recommendation Systems with Semantics and Context-Awareness 18