�ݺ�ߣ

A Recommendation System for the
Semantic Web

Victor Codina and Luigi Ceccaroni
vcodina@lsi.upc.edu
Departament de Llenguatges i Sistemes Informàtics (LSI)
Universitat Politècnica de Catalunya (UPC)

DCAI 2010, September 7-10 2010, Valencia

Introduction Our semantic approach Evaluation Conclusions

Outline

 Introduction & motivations

 Our semantic approach

 Evaluation

 Conclusions & future work

DCAI 2010, September 7-10 2010, Valencia 2


The general personalization process
ITEMS

CONTENT ADAPTATION
Item
Representation

USER MODELING Recommendation
strategy
Implicit
feedback
Learning
User Profile
algorithm Personalized
Explicit
Recommendation
feedback

User satisfaction
User behavior USERS



Potential benefits of using semantics

 The use of semantics provides several advantages to
reduce some limitations of current recommenders

o Cold-start problem
• By inferring missing information exploiting the relationships
of domain ontologies

o Domain-dependency
• By employing standard ontology-based languages to
uniformly represent information



Service oriented architecture design



User’s interests and Item representation

 Ontology-based representation (weighted overlay)

Weighted
User’s interest
Concept taxonomies

Weighted
Item annotation



How do we take advantage of semantics?

 We incorporate semantics in both stages of the
personalization process to reduce the cold-start problem
o The user-profile learning algorithm employs a domain-based
inference method
• It expands and enrich the user-profiles with interests that cannot
be directly inferred from the user feedback

o The Content-based recommendation algorithm employs a
taxonomy-based similarity method
• It uses the user’s interests in more general concepts related to the
item’s annotations in order to refine the matching calculation



Semantically-enhanced learning algorithm

START. The user provides some feedback
about an item (e.g. a purchase or rating of an item)
User

Step 1. Interest weights of the concepts
related to the item are calculated/updated
Inferred

Learnt
Step 2. A domain-based inference method Updated
infers new interests from the families of
concepts with updated interests

Item



The domain-based inference method

 Based on the minimum percentage of direct subconcepts
 Two types of propagation
o Upward-based (propagation to the parent concept)
o Sideward-based (propagation to the siblings)

Upward-based? Sport
Pct(subconcepts) = 4/5 = 0.8 [0.5] Sideward-based?
0.8 > UIT = 0.6 => Propagation Pct(subconcepts) = 4/5 = 0.8
0.8 > SIT = 0.9 => No propagation

Baseball Basketball Football Tennis Golf
[-0.5] [0.5] [1.0] [1.0] [?]

Upward-based threshold (UIT) = 0.6
Sideward-based threshold (SIT) = 0.9


Semantically-enhanced content-based filtering
START. The system has to predict if the user
will like/dislike an item User

FOR EACH item’s annotation DO:
STEP 1. The conceptScore is calculated based on: Partial
• The interest degree of the user’s interests that Partial
match the item’s annotation
• The semantic similarity of the matchings C2
(perfect or partial match) C1 Perfect
END FOR

Item
STEP 2. The itemScore is calculated using the weighted
average of conceptScore values according to their relevance



The taxonomy-based similarity method

 Based on the distance in terms of taxonomy levels between
o The item’s annotation
o The user’s interest (an ancestor of the item’s annotation)
 Weighted semantic distance among levels using K factor

Level 1 Source Genre

User
Level 2 Sport Romance
Interest
distance = 1
User Item
Level 3 Extreme Annotation Steamy Romance K3 = 0.4
Interest
distance = 1
Item
SIM = 0.6
Level 4 Climbing K4 = 0.3
Annotation
SIM = 0.7



Experimental dataset

 Netflix-prize movie dataset
o 480,000 users
o 17,700 movies
o 100M user ratings ranging between 1 and 5
 Movie taxonomy used by Netflix for annotating movies
o 1 global hierarchy of concepts describing the movies
o 3 levels of depth
o 550 nodes (item’s annotations)
 RMSE metric
o Measures the error on rating prediction for a set of users



Experimental evaluation

 Exp. 1: Traditional vs semantic approach
o GOAL. To evaluate the improvement on accuracy when the
semantics-based methods are employed
• Is cold-start problem reduced?

 Exp. 2: Semantic approach on two different taxonomies
o GOAL. To analyze if the hierarchical structure of the taxonomy
affect the effectiveness of semantics-based methods
• How the taxonomy structure affect their performance?



Exp.1: Traditional vs Semantic approach

 Experiment setup
o The error of two algorithm configurations is compared
• CB configuration (traditional CB approach)
• SEM-CB configuration (semantically-enhanced CB approach)

User profile Interest-prediction
Config. Item - User matching
representation method
Keyword-based
CB Rating-based Perfect matches
profile
Rating-based Perfect + Partial
Ontology-based
SEM-CB + matches
profile
Domain inference (semantic similarity)




 Overall prediction results:

1,065
1,06
1,055
1,05
1,045
RMSE
1,04
1,035
1,03
1,025
CB SEM-CB




 Prediction results grouped by user-profile size (nº ratings)

Each interval nearly contains
2% of predictions of the Netflix test-set




 Comparison of RMSE based on user-profile size

The improvement is bigger in users with
small profile-size (the cold-start users)



Exp.2: Semantic approach on different taxonomies

 Experiment setup
o Two semantics-based configurations are compared on
different versions of the movie taxonomy:
• Sem-CB configuration (employs the original taxonomy)
• Sem-CB+ configuration (employs an alternative version)

Taxonomy properties
Avg. Size of nodes
Config. Nº nodes Nº levels Nº hierarchies
per family
SEM-CB 550 3 1 14
SEM-CB+ 550 4 4 7



Exp.2: Semantic approach on different taxonomies

 Results: Parameter settings of semantics-based algorithms

Optimal execution Same accuracy



Conclusions and Future work

 Main conclusions
o The cold-start problem is reduced by exploiting semantics
o The incorporation of semantics in a traditional CB approach
o The recommender is domain-independent by combining
• A service oriented architecture design
• Standard ontology-based languages (FOAF, OWL)
 Future work
o Further experimentation
• In richer domains and with other semantic methods
o The incorporation of semantics into other approaches
• e.g. Collaborative Filtering and Hybrid systems



Exp.1: Traditional vs Semantically-enhanced

 Comparison of overall accuracy results:

1,08
1,06
1,04
1,02
1
0,98
0,96
0,94 RMSE
0,92
0,9
0,88


�ݺ�ߣ

Presentacion Dcai 2010

More Related Content

Presentacion Dcai 2010