�ݺ�ߣ

CC 2.0 by Horia Varlan | http://ﬂic.kr/p/7vjmof

Septem
ber 1,
2012

•  What are Product Recommenders 2

•  Introducing Recommenders
•  A Simple Example
•  Recommender Evaluation
•  How do they work?
•  Machine learning tool – Apache
Mahout

Namics Conference 2012

Agenda

Septem
ber 1,
2012

•  Spin-oﬀ of MeMo News AG, the 3

leading provider for Social Media
Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
HBase and Solr
•  Objective: Transforming data into
insights

Intro

About Sentric

CC 2.0 by Dennis Wong | http://ﬂic.kr/p/6C3RuV

Septem
ber 1,
2012

•  Each day we form opinions about 5

things we like, don’t like, and don’t
even care about.
•  People tend to like things …
•  that similar people like
•  that are similar to other things they like
•  These patterns can be used to predict
such likes and dislikes.

Introducing Recommenders

The Patterns

Septem
ber 1,
2012

user-based – Look to what people with 6

similar tastes seem to like

Example:


Strategies for Discovering New Things

Septem
ber 1,
2012

item-based – Figure out what items are 7

like the ones you already like (again by looking to
others’ apparent preferences)

Example:



Septem
ber 1,
2012

content-based – Suggest items based on 8
Septem

particular attribute (again by looking to others’ apparent
ber 1,
2012

preferences)

Example:



Septem
ber 1,
2012

9
Collaborative Filtering –
Item-based
Producing recommendations
based on, and only based
on, knowledge of users’ User-based Content-based
relationships to items.

Recommenders

Recommendation is all about predicting
patterns of taste, and using them to
discover new and desirable things you
didn’t already know about.


The Deﬁnition of Recommendation

CC 2.0 by Will Scullin | http://ﬂic.kr/p/6K9jb8

Septem
ber 1,
2012

•  Let’s start with a simple example 11

Create
Input
Create
a
Analyse
the

Data
Recommender
Output

A Simple user-based Example

The Workﬂow

Septem
ber 1,
2012

•  Recommendations will 1,101,5.0   12
1,102,3.0  
base on input-data User 1 has a
preference 3.0 1,103,2.5  
for item 102 2,101,2.0  
•  Data takes the form of 2,102,2.5  
preferences –associations 2,103,5.0  
2,104,2.0  
from users to items 3,101,2.5  
3,104,4.0  
3,105,4.5  
3,107,5.0  
Example: 4,101,5.0  
4,103,3.0"
4,104,4.5"
These values might be ratings 4,106,4.0"
on a scale of 1 to 5, where 1 5,101,4.0"
5,102,3.0"
indicates items the user can’t 5,103,2.0"
5,104,4.0"
stand, and 5 indicates 5,105,3.5"
favorites. 5,106,4.0 "


Input Data

Septem
ber 1,
2012

•  Trend visualization for positive users 1,101,5.0   13
1,102,3.0  
preferences (in petrol) 1,103,2.5  
2,101,2.0  
2,102,2.5  
1 5 3 2,103,5.0  
2,104,2.0  
3,101,2.5  
3,104,4.0  
3,105,4.5  
101 102 103 104 105 106 107 3,107,5.0  
4,101,5.0  
4,103,3.0"
4,104,4.5"
4,106,4.0"
5,101,4.0"
2 4 5,102,3.0"
5,103,2.0"
5,104,4.0"
•  All other preferences are recognized as 5,105,3.5"
negative – the user doesn’t seem to like the 5,106,4.0 "
item that much (red, dotted)


Trend Visualization

Septem
ber 1,
2012

Users 1 and 5 seem to have similar tastes. 14
Both like 101, like 102 a little less, and like 103 less still

1 5

101 102 103 104 105 106 107

Users 1 and 4 seem to
have similar tastes. Both
2 4
seem to like 101 and 103
identically
Users 1 and 2 have tastes that seem
to run counter to each other


Trend Visualization

Septem
ber 1,
2012

So what product might be recommended to 15

user 1?
1 5 3

101 102 103 104 105 106 107

2 4

Obviously not 101, 102 or 103. User 1 already knows about these.


Analyzing the Output

Septem
ber 1,
2012

The output could be: [item:104, value:4.257081]" 16

The recommender engine did so because it
estimated user 1’s preference for 104 to be
about 4.3, and that was the highest among all
the items eligible for recommendation.

Questions:
•  Is this the best recommendation for user 1?
•  What exactly is a good recommendation?


Analyzing the Output

CC 2.0 by larsaaboe | http://ﬂic.kr/p/7nJpV8

Septem
ber 1,
2012

Goal: 18

Evaluate how closely the estimated
preferences match the actual preferences.

How?
Produce Compare
estimate estimates with
Reasonable 30% for test
Prepare Split Run preferences Analyse test data à
data set
70 % for training
with training Calculate a
data score

Experiment with other recommenders


Evaluating a Recommender

Septem
ber 1,
2012

Example evaluation output for a 19

particular recommender engine
Item 1 Item 2 Item 3
Actual 3.0 5.0 4.0
Estimate 3.5 2.0 5.0
Diﬀerence 0.5 3.0 1.0
Average distance = (0.5+3.0+1.0)/3=1.5
Root-mean-square =√((0.52+3.02+1.02)/3)=1.8484

Note: A score of 0.0 would mean perfect estimation


Evaluating a Recommender

CC 2.0 by amtrak_russ | http://ﬂic.kr/p/6fAPej

Septem
ber 1,
2012

•  Mahout … 21

•  Open-source machine learning library from
Apache (Java)
•  Can be used for large data collections – it’s
scalable, build upon Apache Hadoop
•  Implements algorithms such as
Classiﬁcation, Recommenders, Clustering
•  Incubates a number of techniques and
algorithms
•  ML it’s a hype! But …

In a Nutshell

Apache Mahout

Septem
ber 1,
2012

A Simple Recommender 22

class RecommenderExample {"
… main(String[] args) throws … {"
DataModel model = new FileDataModel(new File(“examle.csv")); "
UserSimilarity similarity = "
new PearsonCorrelationSimilarity(model);"
UserNeighborhood neighborhood = "
new NearestNUserNeighborhood(2, similarity, model);"
Recommender recommender = "
new GenericUserBasedRecommender(model, neighborhood, similarity);"
List<RecommendedItem> recommendations = recommender.recommend(1, 1);"
" for (RecommendedItem recommendation : recommendations) {"
System.out.println(recommendation);"
}"
}}"


Create a Recommender

Septem
ber 1,
2012

23

<<interface>>

UserSimilarity

<<interface>>
<<interface>>

ApplicaAon

Recommender
DataModel

<<interface>>

UserNeighborhood

A user-based Recommender

Component Interaction

Septem
ber 1,
2012

NearestNUserNeighborhood ThresholdUserNeighborhood 24

2
2

1
1

5
5

3
3

4
4

A neighborhood around user 1
is chosen to consist of the Deﬁning a neighborhood of
three most similar users: 5, 4, most-similar users with a
and 2 similarity threshold

Algorithms

UserNeighborhood

Septem
ber 1,
2012

Implementations of this interface deﬁne a 25

notion of similarity between two users.
Implementations should return values in the
range -1.0 to 1.0, with 1.0 representing perfect
similarity.
<<interface>> 
UserSimilarity"

EuclideanDistance PearsonCorrelation UncenteredCosine
Similarity" Similarity" Similarity"

LogLikelihood TanimotoCoefficient
..."
Similarity" Similarity"

Algorithms

User Similarity

Septem
ber 1,
2012

Similarity between data objects can be represented in 26
a variety of ways:

•  Distance between data objects is sum of the
distances of each attribute of the data objects (i.e.
Euclidean Distance)
•  Measuring how the attributes of both data objects
change with respect to the variation of the mean
value for the attributes (Pearson Correlation
coeﬃcient)
•  Using the word frequencies for each document, the
normalized dot product of the frequencies can be
used as a measure of similarity (cosine similarity)
•  An a few more ..

Algorithms

User Similarity

Septem
ber 1,
2012

Similarity between 27

two data objects: 5

4

User 5 User 1
3

102
User 2

2

1

User 3 User 4
0
0 1 2 3 4 5
101

Mathematically & Plot

Euclidean Distance

Septem
ber 1,
2012

Similarity between 28

two data objects:
5

4.5

4 104 101

3.5

3 102

User 5
2.5

2 103

1.5

1

0.5

0
0 1 2 3 4 5
User 1

Mathematically & Plot

Pearson Correlation

Septem
ber 1,
2012

29

Questions?
Jean-Pierre König, jean-pierre.koenig@sentric.ch

Namics Conference 2012

Thank you!

Septem
ber 1,
2012

•  References 30

The content of this presentation is based on:
•  Chapter 1, 2 and 4 of the following book:
Owen, Anil, Dunning, Friedman. Mahout in
Action. Shelter Island, NY: Manning
Publications Co., 2012.
•  Chapter “Discussion of Similarity Metrics” of
the following publication: Shanley Philip.
Data Mining Portfolio.
•  Links
http://bitly.com/bundles/jpkoenig/1


Literatur & Links

�ݺ�ߣ

What are product recommendations, and how do they work?

More Related Content

What are product recommendations, and how do they work?