The content of this presentation is based on:
Chapter 1, 2 and 4 of the following book: Owen, Anil, Dunning, Friedman. Mahout in Action. Shelter Island, NY: Manning Publications Co., 2012.
Chapter Discussion of Similarity Metrics of the following publication: Shanley Philip. Data Mining Portfolio.
1 of 30
Downloaded 63 times
More Related Content
What are product recommendations, and how do they work?
1. CC 2.0 by Horia Varlan | http://鍖ic.kr/p/7vjmof
2. Septem
ber 1,
2012
≒ What are Product Recommenders 2
≒ Introducing Recommenders
≒ A Simple Example
≒ Recommender Evaluation
≒ How do they work?
≒ Machine learning tool Apache
Mahout
Namics Conference 2012
Agenda
3. Septem
ber 1,
2012
≒ Spin-o鍖 of MeMo News AG, the 3
leading provider for Social Media
Monitoring & Analytics in Switzerland
≒ Big Data expert, focused on Hadoop,
HBase and Solr
≒ Objective: Transforming data into
insights
Intro
About Sentric
5. Septem
ber 1,
2012
≒ Each day we form opinions about 5
things we like, dont like, and dont
even care about.
≒ People tend to like things
≒ that similar people like
≒ that are similar to other things they like
≒ These patterns can be used to predict
such likes and dislikes.
Introducing Recommenders
The Patterns
6. Septem
ber 1,
2012
user-based Look to what people with 6
similar tastes seem to like
Example:
Introducing Recommenders
Strategies for Discovering New Things
7. Septem
ber 1,
2012
item-based Figure out what items are 7
like the ones you already like (again by looking to
others apparent preferences)
Example:
Introducing Recommenders
Strategies for Discovering New Things
8. Septem
ber 1,
2012
content-based Suggest items based on 8
Septem
particular attribute (again by looking to others apparent
ber 1,
2012
preferences)
Example:
Introducing Recommenders
Strategies for Discovering New Things
9. Septem
ber 1,
2012
9
Collaborative Filtering
Item-based
Producing recommendations
based on, and only based
on, knowledge of users User-based Content-based
relationships to items.
Recommenders
Recommendation is all about predicting
patterns of taste, and using them to
discover new and desirable things you
didnt already know about.
Introducing Recommenders
The De鍖nition of Recommendation
10. CC 2.0 by Will Scullin | http://鍖ic.kr/p/6K9jb8
11. Septem
ber 1,
2012
≒ Lets start with a simple example 11
Create
Input
Create
a
Analyse
the
Data
Recommender
Output
A Simple user-based Example
The Work鍖ow
12. Septem
ber 1,
2012
≒ Recommendations will 1,101,5.0 12
1,102,3.0
base on input-data User 1 has a
preference 3.0 1,103,2.5
for item 102 2,101,2.0
≒ Data takes the form of 2,102,2.5
preferences associations 2,103,5.0
2,104,2.0
from users to items 3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
Example: 4,101,5.0
4,103,3.0"
4,104,4.5"
These values might be ratings 4,106,4.0"
on a scale of 1 to 5, where 1 5,101,4.0"
5,102,3.0"
indicates items the user cant 5,103,2.0"
5,104,4.0"
stand, and 5 indicates 5,105,3.5"
favorites. 5,106,4.0 "
A Simple user-based Example
Input Data
13. Septem
ber 1,
2012
≒ Trend visualization for positive users 1,101,5.0 13
1,102,3.0
preferences (in petrol) 1,103,2.5
2,101,2.0
2,102,2.5
1 5 3 2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
101 102 103 104 105 106 107 3,107,5.0
4,101,5.0
4,103,3.0"
4,104,4.5"
4,106,4.0"
5,101,4.0"
2 4 5,102,3.0"
5,103,2.0"
5,104,4.0"
≒ All other preferences are recognized as 5,105,3.5"
negative the user doesnt seem to like the 5,106,4.0 "
item that much (red, dotted)
A Simple user-based Example
Trend Visualization
14. Septem
ber 1,
2012
Users 1 and 5 seem to have similar tastes. 14
Both like 101, like 102 a little less, and like 103 less still
1 5
101 102 103 104 105 106 107
Users 1 and 4 seem to
have similar tastes. Both
2 4
seem to like 101 and 103
identically
Users 1 and 2 have tastes that seem
to run counter to each other
A Simple user-based Example
Trend Visualization
15. Septem
ber 1,
2012
So what product might be recommended to 15
user 1?
1 5 3
101 102 103 104 105 106 107
2 4
Obviously not 101, 102 or 103. User 1 already knows about these.
A Simple user-based Example
Analyzing the Output
16. Septem
ber 1,
2012
The output could be: [item:104, value:4.257081]" 16
The recommender engine did so because it
estimated user 1s preference for 104 to be
about 4.3, and that was the highest among all
the items eligible for recommendation.
Questions:
≒ Is this the best recommendation for user 1?
≒ What exactly is a good recommendation?
A Simple user-based Example
Analyzing the Output
18. Septem
ber 1,
2012
Goal: 18
Evaluate how closely the estimated
preferences match the actual preferences.
How?
Produce Compare
estimate estimates with
Reasonable 30% for test
Prepare Split Run preferences Analyse test data
data set
70 % for training
with training Calculate a
data score
Experiment with other recommenders
A Simple user-based Example
Evaluating a Recommender
19. Septem
ber 1,
2012
Example evaluation output for a 19
particular recommender engine
Item 1 Item 2 Item 3
Actual 3.0 5.0 4.0
Estimate 3.5 2.0 5.0
Di鍖erence 0.5 3.0 1.0
Average distance = (0.5+3.0+1.0)/3=1.5
Root-mean-square =((0.52+3.02+1.02)/3)=1.8484
Note: A score of 0.0 would mean perfect estimation
A Simple user-based Example
Evaluating a Recommender
20. CC 2.0 by amtrak_russ | http://鍖ic.kr/p/6fAPej
21. Septem
ber 1,
2012
≒ Mahout 21
≒ Open-source machine learning library from
Apache (Java)
≒ Can be used for large data collections its
scalable, build upon Apache Hadoop
≒ Implements algorithms such as
Classi鍖cation, Recommenders, Clustering
≒ Incubates a number of techniques and
algorithms
≒ ML its a hype! But
In a Nutshell
Apache Mahout
22. Septem
ber 1,
2012
A Simple Recommender 22
class RecommenderExample {"
main(String[] args) throws {"
DataModel model = new FileDataModel(new File(examle.csv")); "
UserSimilarity similarity = "
new PearsonCorrelationSimilarity(model);"
UserNeighborhood neighborhood = "
new NearestNUserNeighborhood(2, similarity, model);"
Recommender recommender = "
new GenericUserBasedRecommender(model, neighborhood, similarity);"
List<RecommendedItem> recommendations = recommender.recommend(1, 1);"
" for (RecommendedItem recommendation : recommendations) {"
System.out.println(recommendation);"
}"
}}"
A Simple user-based Example
Create a Recommender
23. Septem
ber 1,
2012
23
<<interface>>
UserSimilarity
<<interface>>
<<interface>>
ApplicaAon
Recommender
DataModel
<<interface>>
UserNeighborhood
A user-based Recommender
Component Interaction
24. Septem
ber 1,
2012
NearestNUserNeighborhood ThresholdUserNeighborhood 24
2
2
1
1
5
5
3
3
4
4
A neighborhood around user 1
is chosen to consist of the De鍖ning a neighborhood of
three most similar users: 5, 4, most-similar users with a
and 2 similarity threshold
Algorithms
UserNeighborhood
25. Septem
ber 1,
2012
Implementations of this interface de鍖ne a 25
notion of similarity between two users.
Implementations should return values in the
range -1.0 to 1.0, with 1.0 representing perfect
similarity.
<<interface>>
UserSimilarity"
EuclideanDistance PearsonCorrelation UncenteredCosine
Similarity" Similarity" Similarity"
LogLikelihood TanimotoCoefficient
..."
Similarity" Similarity"
Algorithms
User Similarity
26. Septem
ber 1,
2012
Similarity between data objects can be represented in 26
a variety of ways:
≒ Distance between data objects is sum of the
distances of each attribute of the data objects (i.e.
Euclidean Distance)
≒ Measuring how the attributes of both data objects
change with respect to the variation of the mean
value for the attributes (Pearson Correlation
coe鍖cient)
≒ Using the word frequencies for each document, the
normalized dot product of the frequencies can be
used as a measure of similarity (cosine similarity)
≒ An a few more ..
Algorithms
User Similarity
27. Septem
ber 1,
2012
Similarity between 27
two data objects: 5
4
User 5 User 1
3
102
User 2
2
1
User 3 User 4
0
0 1 2 3 4 5
101
Mathematically & Plot
Euclidean Distance
28. Septem
ber 1,
2012
Similarity between 28
two data objects:
5
4.5
4 104 101
3.5
3 102
User 5
2.5
2 103
1.5
1
0.5
0
0 1 2 3 4 5
User 1
Mathematically & Plot
Pearson Correlation
29. Septem
ber 1,
2012
29
Questions?
Jean-Pierre K旦nig, jean-pierre.koenig@sentric.ch
Namics Conference 2012
Thank you!
30. Septem
ber 1,
2012
≒ References 30
The content of this presentation is based on:
≒ Chapter 1, 2 and 4 of the following book:
Owen, Anil, Dunning, Friedman. Mahout in
Action. Shelter Island, NY: Manning
Publications Co., 2012.
≒ Chapter Discussion of Similarity Metrics of
the following publication: Shanley Philip.
Data Mining Portfolio.
≒ Links
http://bitly.com/bundles/jpkoenig/1
A Simple user-based Example
Literatur & Links