際際滷

際際滷Share a Scribd company logo
A Recommendation System for the
        Semantic Web

        Victor Codina and Luigi Ceccaroni
                 vcodina@lsi.upc.edu
  Departament de Llenguatges i Sistemes Informtics (LSI)
        Universitat Polit竪cnica de Catalunya (UPC)




         DCAI 2010, September 7-10 2010, Valencia
Introduction   Our semantic approach                 Evaluation    Conclusions



                              Outline


 Introduction & motivations

 Our semantic approach

 Evaluation

 Conclusions & future work




                        DCAI 2010, September 7-10 2010, Valencia                 2
Introduction         Our semantic approach                       Evaluation          Conclusions


                The general personalization process
                                             ITEMS

                                                                                 CONTENT ADAPTATION
                                                       Item
                                                  Representation



                   USER MODELING                                                       Recommendation
                                                                                           strategy
      Implicit
     feedback
                     Learning
                                                     User Profile
                     algorithm                                                           Personalized
      Explicit
                                                                                       Recommendation
     feedback

                      User satisfaction
User behavior                                                                                     USERS




                                      DCAI 2010, September 7-10 2010, Valencia                            3
Introduction     Our semantic approach                 Evaluation    Conclusions


        Potential benefits of using semantics

 The use of semantics provides several advantages to
  reduce some limitations of current recommenders

   o Cold-start problem
         By inferring missing information exploiting the relationships
          of domain ontologies

   o Domain-dependency
         By employing standard ontology-based languages to
          uniformly represent information



                           DCAI 2010, September 7-10 2010, Valencia                 4
Introduction   Our semantic approach                 Evaluation    Conclusions


        Service oriented architecture design




                        DCAI 2010, September 7-10 2010, Valencia                 5
Introduction      Our semantic approach                 Evaluation        Conclusions


      Users interests and Item representation

 Ontology-based representation (weighted overlay)




    Weighted
  Users interest
                                                                        Concept taxonomies




    Weighted
 Item annotation



                             DCAI 2010, September 7-10 2010, Valencia                        6
Introduction       Our semantic approach                 Evaluation    Conclusions


    How do we take advantage of semantics?

 We incorporate semantics in both stages of the
  personalization process to reduce the cold-start problem
   o The user-profile learning algorithm employs a domain-based
     inference method
         It expands and enrich the user-profiles with interests that cannot
          be directly inferred from the user feedback


   o The Content-based recommendation algorithm employs a
     taxonomy-based similarity method
         It uses the users interests in more general concepts related to the
          items annotations in order to refine the matching calculation


                             DCAI 2010, September 7-10 2010, Valencia                 7
Introduction        Our semantic approach                  Evaluation      Conclusions


       Semantically-enhanced learning algorithm

                START. The user provides some feedback
                about an item (e.g. a purchase or rating of an item)
                                                                                       User


Step 1. Interest weights of the concepts
related to the item are calculated/updated
                                                            Inferred

                                                                       Learnt
Step 2. A domain-based inference method                                                  Updated
infers new interests from the families of
concepts with updated interests

                                                                                Item



                                   DCAI 2010, September 7-10 2010, Valencia                        8
Introduction          Our semantic approach                    Evaluation          Conclusions


          The domain-based inference method

 Based on the minimum percentage of direct subconcepts
 Two types of propagation
    o Upward-based (propagation to the parent concept)
    o Sideward-based (propagation to the siblings)


Upward-based?                                     Sport
Pct(subconcepts) = 4/5 = 0.8                       [0.5]                Sideward-based?
0.8 > UIT = 0.6 => Propagation                                          Pct(subconcepts) = 4/5 = 0.8
                                                                        0.8 > SIT = 0.9 => No propagation

        Baseball        Basketball            Football             Tennis         Golf
         [-0.5]             [0.5]                [1.0]               [1.0]         [?]

 Upward-based threshold (UIT) = 0.6
 Sideward-based threshold (SIT) = 0.9
                                    DCAI 2010, September 7-10 2010, Valencia                         9
Introduction         Our semantic approach                 Evaluation              Conclusions


Semantically-enhanced content-based filtering
START. The system has to predict if the user
will like/dislike an item                                                                          User


FOR EACH items annotation DO:
  STEP 1. The conceptScore is calculated based on:                         Partial
   The interest degree of the users interests that                                        Partial
  match the items annotation
   The semantic similarity of the matchings                               C2
  (perfect or partial match)                                                                   C1 Perfect
 END FOR

                                                                                        Item
STEP 2. The itemScore is calculated using the weighted
average of conceptScore values according to their relevance


                                DCAI 2010, September 7-10 2010, Valencia                              10
Introduction        Our semantic approach                   Evaluation            Conclusions


          The taxonomy-based similarity method

 Based on the distance in terms of taxonomy levels between
   o The items annotation
   o The users interest (an ancestor of the items annotation)
 Weighted semantic distance among levels using K factor

Level 1                   Source                                                 Genre

                                                                User
Level 2                   Sport                                                 Romance
                                                              Interest
                                                                                     distance = 1
             User                                         Item
Level 3                  Extreme                        Annotation           Steamy Romance     K3 = 0.4
           Interest
                               distance = 1
            Item
                                                                              SIM = 0.6
Level 4                  Climbing            K4 = 0.3
          Annotation
                         SIM = 0.7

                                  DCAI 2010, September 7-10 2010, Valencia                            11
Introduction    Our semantic approach                 Evaluation    Conclusions


                   Experimental dataset

 Netflix-prize movie dataset
   o 480,000 users
   o 17,700 movies
   o 100M user ratings ranging between 1 and 5
 Movie taxonomy used by Netflix for annotating movies
  o 1 global hierarchy of concepts describing the movies
  o 3 levels of depth
  o 550 nodes (items annotations)
 RMSE metric
   o Measures the error on rating prediction for a set of users


                           DCAI 2010, September 7-10 2010, Valencia                 12
Introduction      Our semantic approach                 Evaluation    Conclusions


                  Experimental evaluation

 Exp. 1: Traditional vs semantic approach
   o GOAL. To evaluate the improvement on accuracy when the
     semantics-based methods are employed
          Is cold-start problem reduced?


 Exp. 2: Semantic approach on two different taxonomies
   o GOAL. To analyze if the hierarchical structure of the taxonomy
     affect the effectiveness of semantics-based methods
          How the taxonomy structure affect their performance?




                             DCAI 2010, September 7-10 2010, Valencia                 13
Introduction        Our semantic approach                 Evaluation            Conclusions


     Exp.1: Traditional vs Semantic approach

 Experiment setup
   o The error of two algorithm configurations is compared
         CB configuration (traditional CB approach)
         SEM-CB configuration (semantically-enhanced CB approach)

                  User profile      Interest-prediction
   Config.                                                           Item - User matching
                representation            method
                Keyword-based
  CB                                     Rating-based                    Perfect matches
                    profile
                                       Rating-based                    Perfect + Partial
                Ontology-based
  SEM-CB                                     +                             matches
                    profile
                                     Domain inference                (semantic similarity)



                              DCAI 2010, September 7-10 2010, Valencia                         14
Introduction     Our semantic approach                 Evaluation           Conclusions


      Exp.1: Traditional vs Semantic approach

 Overall prediction results:

                 1,065
                  1,06
                 1,055
                  1,05
                 1,045
                                                                       RMSE
                  1,04
                 1,035
                  1,03
                 1,025
                              CB                SEM-CB

                            DCAI 2010, September 7-10 2010, Valencia                        15
Introduction   Our semantic approach                 Evaluation    Conclusions


      Exp.1: Traditional vs Semantic approach

 Prediction results grouped by user-profile size (n尊 ratings)




                                Each interval nearly contains
                                2% of predictions of the Netflix test-set




                          DCAI 2010, September 7-10 2010, Valencia                 16
Introduction   Our semantic approach                 Evaluation    Conclusions


     Exp.1: Traditional vs Semantic approach

 Comparison of RMSE based on user-profile size




                            The improvement is bigger in users with
                            small profile-size (the cold-start users)




                         DCAI 2010, September 7-10 2010, Valencia                 17
Introduction        Our semantic approach                 Evaluation        Conclusions


Exp.2: Semantic approach on different taxonomies

  Experiment setup
     o Two semantics-based configurations are compared on
       different versions of the movie taxonomy:
           Sem-CB configuration (employs the original taxonomy)
           Sem-CB+ configuration (employs an alternative version)

                                          Taxonomy properties
                                                                           Avg. Size of nodes
      Config.     N尊 nodes       N尊 levels              N尊 hierarchies
                                                                              per family
    SEM-CB          550               3                          1                14
    SEM-CB+         550               4                          4                 7



                                DCAI 2010, September 7-10 2010, Valencia                        18
Introduction      Our semantic approach                 Evaluation    Conclusions


Exp.2: Semantic approach on different taxonomies

  Results:                Parameter settings of semantics-based algorithms




 Optimal execution                                                          Same accuracy

                              DCAI 2010, September 7-10 2010, Valencia                 19
Introduction      Our semantic approach                 Evaluation    Conclusions


                 Conclusions and Future work

 Main conclusions
  o The cold-start problem is reduced by exploiting semantics
  o The incorporation of semantics in a traditional CB approach
  o The recommender is domain-independent by combining
          A service oriented architecture design
          Standard ontology-based languages (FOAF, OWL)
 Future work
   o Further experimentation
          In richer domains and with other semantic methods
    o The incorporation of semantics into other approaches
          e.g. Collaborative Filtering and Hybrid systems

                             DCAI 2010, September 7-10 2010, Valencia                 20
A Recommendation System for the
        Semantic Web

        Victor Codina and Luigi Ceccaroni
                 vcodina@lsi.upc.edu
  Departament de Llenguatges i Sistemes Informtics (LSI)
        Universitat Polit竪cnica de Catalunya (UPC)




         DCAI 2010, September 7-10 2010, Valencia
Introduction      Our semantic approach                 Evaluation           Conclusions


  Exp.1: Traditional vs Semantically-enhanced

 Comparison of overall accuracy results:

                 1,08
                 1,06
                 1,04
                 1,02
                    1
                 0,98
                 0,96
                 0,94                                                   RMSE
                 0,92
                  0,9
                 0,88




                             DCAI 2010, September 7-10 2010, Valencia                        22

More Related Content

Presentacion Dcai 2010

  • 1. A Recommendation System for the Semantic Web Victor Codina and Luigi Ceccaroni vcodina@lsi.upc.edu Departament de Llenguatges i Sistemes Informtics (LSI) Universitat Polit竪cnica de Catalunya (UPC) DCAI 2010, September 7-10 2010, Valencia
  • 2. Introduction Our semantic approach Evaluation Conclusions Outline Introduction & motivations Our semantic approach Evaluation Conclusions & future work DCAI 2010, September 7-10 2010, Valencia 2
  • 3. Introduction Our semantic approach Evaluation Conclusions The general personalization process ITEMS CONTENT ADAPTATION Item Representation USER MODELING Recommendation strategy Implicit feedback Learning User Profile algorithm Personalized Explicit Recommendation feedback User satisfaction User behavior USERS DCAI 2010, September 7-10 2010, Valencia 3
  • 4. Introduction Our semantic approach Evaluation Conclusions Potential benefits of using semantics The use of semantics provides several advantages to reduce some limitations of current recommenders o Cold-start problem By inferring missing information exploiting the relationships of domain ontologies o Domain-dependency By employing standard ontology-based languages to uniformly represent information DCAI 2010, September 7-10 2010, Valencia 4
  • 5. Introduction Our semantic approach Evaluation Conclusions Service oriented architecture design DCAI 2010, September 7-10 2010, Valencia 5
  • 6. Introduction Our semantic approach Evaluation Conclusions Users interests and Item representation Ontology-based representation (weighted overlay) Weighted Users interest Concept taxonomies Weighted Item annotation DCAI 2010, September 7-10 2010, Valencia 6
  • 7. Introduction Our semantic approach Evaluation Conclusions How do we take advantage of semantics? We incorporate semantics in both stages of the personalization process to reduce the cold-start problem o The user-profile learning algorithm employs a domain-based inference method It expands and enrich the user-profiles with interests that cannot be directly inferred from the user feedback o The Content-based recommendation algorithm employs a taxonomy-based similarity method It uses the users interests in more general concepts related to the items annotations in order to refine the matching calculation DCAI 2010, September 7-10 2010, Valencia 7
  • 8. Introduction Our semantic approach Evaluation Conclusions Semantically-enhanced learning algorithm START. The user provides some feedback about an item (e.g. a purchase or rating of an item) User Step 1. Interest weights of the concepts related to the item are calculated/updated Inferred Learnt Step 2. A domain-based inference method Updated infers new interests from the families of concepts with updated interests Item DCAI 2010, September 7-10 2010, Valencia 8
  • 9. Introduction Our semantic approach Evaluation Conclusions The domain-based inference method Based on the minimum percentage of direct subconcepts Two types of propagation o Upward-based (propagation to the parent concept) o Sideward-based (propagation to the siblings) Upward-based? Sport Pct(subconcepts) = 4/5 = 0.8 [0.5] Sideward-based? 0.8 > UIT = 0.6 => Propagation Pct(subconcepts) = 4/5 = 0.8 0.8 > SIT = 0.9 => No propagation Baseball Basketball Football Tennis Golf [-0.5] [0.5] [1.0] [1.0] [?] Upward-based threshold (UIT) = 0.6 Sideward-based threshold (SIT) = 0.9 DCAI 2010, September 7-10 2010, Valencia 9
  • 10. Introduction Our semantic approach Evaluation Conclusions Semantically-enhanced content-based filtering START. The system has to predict if the user will like/dislike an item User FOR EACH items annotation DO: STEP 1. The conceptScore is calculated based on: Partial The interest degree of the users interests that Partial match the items annotation The semantic similarity of the matchings C2 (perfect or partial match) C1 Perfect END FOR Item STEP 2. The itemScore is calculated using the weighted average of conceptScore values according to their relevance DCAI 2010, September 7-10 2010, Valencia 10
  • 11. Introduction Our semantic approach Evaluation Conclusions The taxonomy-based similarity method Based on the distance in terms of taxonomy levels between o The items annotation o The users interest (an ancestor of the items annotation) Weighted semantic distance among levels using K factor Level 1 Source Genre User Level 2 Sport Romance Interest distance = 1 User Item Level 3 Extreme Annotation Steamy Romance K3 = 0.4 Interest distance = 1 Item SIM = 0.6 Level 4 Climbing K4 = 0.3 Annotation SIM = 0.7 DCAI 2010, September 7-10 2010, Valencia 11
  • 12. Introduction Our semantic approach Evaluation Conclusions Experimental dataset Netflix-prize movie dataset o 480,000 users o 17,700 movies o 100M user ratings ranging between 1 and 5 Movie taxonomy used by Netflix for annotating movies o 1 global hierarchy of concepts describing the movies o 3 levels of depth o 550 nodes (items annotations) RMSE metric o Measures the error on rating prediction for a set of users DCAI 2010, September 7-10 2010, Valencia 12
  • 13. Introduction Our semantic approach Evaluation Conclusions Experimental evaluation Exp. 1: Traditional vs semantic approach o GOAL. To evaluate the improvement on accuracy when the semantics-based methods are employed Is cold-start problem reduced? Exp. 2: Semantic approach on two different taxonomies o GOAL. To analyze if the hierarchical structure of the taxonomy affect the effectiveness of semantics-based methods How the taxonomy structure affect their performance? DCAI 2010, September 7-10 2010, Valencia 13
  • 14. Introduction Our semantic approach Evaluation Conclusions Exp.1: Traditional vs Semantic approach Experiment setup o The error of two algorithm configurations is compared CB configuration (traditional CB approach) SEM-CB configuration (semantically-enhanced CB approach) User profile Interest-prediction Config. Item - User matching representation method Keyword-based CB Rating-based Perfect matches profile Rating-based Perfect + Partial Ontology-based SEM-CB + matches profile Domain inference (semantic similarity) DCAI 2010, September 7-10 2010, Valencia 14
  • 15. Introduction Our semantic approach Evaluation Conclusions Exp.1: Traditional vs Semantic approach Overall prediction results: 1,065 1,06 1,055 1,05 1,045 RMSE 1,04 1,035 1,03 1,025 CB SEM-CB DCAI 2010, September 7-10 2010, Valencia 15
  • 16. Introduction Our semantic approach Evaluation Conclusions Exp.1: Traditional vs Semantic approach Prediction results grouped by user-profile size (n尊 ratings) Each interval nearly contains 2% of predictions of the Netflix test-set DCAI 2010, September 7-10 2010, Valencia 16
  • 17. Introduction Our semantic approach Evaluation Conclusions Exp.1: Traditional vs Semantic approach Comparison of RMSE based on user-profile size The improvement is bigger in users with small profile-size (the cold-start users) DCAI 2010, September 7-10 2010, Valencia 17
  • 18. Introduction Our semantic approach Evaluation Conclusions Exp.2: Semantic approach on different taxonomies Experiment setup o Two semantics-based configurations are compared on different versions of the movie taxonomy: Sem-CB configuration (employs the original taxonomy) Sem-CB+ configuration (employs an alternative version) Taxonomy properties Avg. Size of nodes Config. N尊 nodes N尊 levels N尊 hierarchies per family SEM-CB 550 3 1 14 SEM-CB+ 550 4 4 7 DCAI 2010, September 7-10 2010, Valencia 18
  • 19. Introduction Our semantic approach Evaluation Conclusions Exp.2: Semantic approach on different taxonomies Results: Parameter settings of semantics-based algorithms Optimal execution Same accuracy DCAI 2010, September 7-10 2010, Valencia 19
  • 20. Introduction Our semantic approach Evaluation Conclusions Conclusions and Future work Main conclusions o The cold-start problem is reduced by exploiting semantics o The incorporation of semantics in a traditional CB approach o The recommender is domain-independent by combining A service oriented architecture design Standard ontology-based languages (FOAF, OWL) Future work o Further experimentation In richer domains and with other semantic methods o The incorporation of semantics into other approaches e.g. Collaborative Filtering and Hybrid systems DCAI 2010, September 7-10 2010, Valencia 20
  • 21. A Recommendation System for the Semantic Web Victor Codina and Luigi Ceccaroni vcodina@lsi.upc.edu Departament de Llenguatges i Sistemes Informtics (LSI) Universitat Polit竪cnica de Catalunya (UPC) DCAI 2010, September 7-10 2010, Valencia
  • 22. Introduction Our semantic approach Evaluation Conclusions Exp.1: Traditional vs Semantically-enhanced Comparison of overall accuracy results: 1,08 1,06 1,04 1,02 1 0,98 0,96 0,94 RMSE 0,92 0,9 0,88 DCAI 2010, September 7-10 2010, Valencia 22