際際滷

際際滷Share a Scribd company logo
CC 2.0 by Horia Varlan | http://鍖ic.kr/p/7vjmof
Septem
                                        ber 1,
                                        2012



≒ What are Product Recommenders         2


   ≒ Introducing Recommenders
   ≒ A Simple Example
   ≒ Recommender Evaluation
≒ How do they work?
   ≒ Machine learning tool  Apache
      Mahout



Namics Conference 2012

Agenda
Septem
                                            ber 1,
                                            2012



≒ Spin-o鍖 of MeMo News AG, the              3

   leading provider for Social Media
   Monitoring & Analytics in Switzerland
≒ Big Data expert, focused on Hadoop,
   HBase and Solr
≒ Objective: Transforming data into
   insights



Intro

About Sentric
CC 2.0 by Dennis Wong | http://鍖ic.kr/p/6C3RuV
Septem
                                                      ber 1,
                                                      2012



≒ Each day we form opinions about                     5

   things we like, dont like, and dont
   even care about.
≒ People tend to like things 
     ≒ that similar people like
     ≒ that are similar to other things they like
≒ These patterns can be used to predict
   such likes and dislikes.


Introducing Recommenders

The Patterns
Septem
                                         ber 1,
                                         2012



user-based  Look to what people with     6

similar tastes seem to like

Example:




Introducing Recommenders

Strategies for Discovering New Things
Septem
                                                       ber 1,
                                                       2012



item-based  Figure out what items are                  7

like the ones you already like (again by looking to
others apparent preferences)



Example:




Introducing Recommenders

Strategies for Discovering New Things
Septem
                                                              ber 1,
                                                              2012



content-based  Suggest items based on                         8
                                                             Septem


particular attribute (again by looking to others apparent
                                                              ber 1,
                                                              2012


preferences)



Example:




Introducing Recommenders

Strategies for Discovering New Things
Septem
                                                                ber 1,
                                                                2012


                                                                 9
Collaborative Filtering 
                                     Item-based
Producing recommendations
based on, and only based
on, knowledge of users   User-based           Content-based
relationships to items.

                                   Recommenders



Recommendation is all about predicting
patterns of taste, and using them to
discover new and desirable things you
didnt already know about.

Introducing Recommenders

The De鍖nition of Recommendation
CC 2.0 by Will Scullin | http://鍖ic.kr/p/6K9jb8
Septem
                                                                         ber 1,
                                                                         2012



≒ Lets start with a simple example                                     11




       Create	
 Input	
        Create	
 a	
    Analyse	
 the	
 
          Data	
             Recommender	
         Output	
 




A Simple user-based Example

The Work鍖ow
Septem
                                                              ber 1,
                                                              2012


≒ Recommendations will                           1,101,5.0  12
                                                  1,102,3.0 
   base on input-data            User 1 has a
                                 preference 3.0   1,103,2.5 
                                 for item 102     2,101,2.0 
≒ Data takes the form of                         2,102,2.5 
   preferences associations                      2,103,5.0 
                                                  2,104,2.0 
   from users to items                            3,101,2.5 
                                                  3,104,4.0 
                                                  3,105,4.5 
                                                  3,107,5.0 
Example:                                          4,101,5.0 
                                                  4,103,3.0"
                                                  4,104,4.5"
These values might be ratings                     4,106,4.0"
on a scale of 1 to 5, where 1                     5,101,4.0"
                                                  5,102,3.0"
indicates items the user cant                    5,103,2.0"
                                                  5,104,4.0"
stand, and 5 indicates                            5,105,3.5"
favorites.                                        5,106,4.0 "
                                                  	
 
                                                  	
 


A Simple user-based Example

Input Data
Septem
                                                                      ber 1,
                                                                      2012

≒    Trend visualization for positive users              1,101,5.0  13
                                                          1,102,3.0 
      preferences (in petrol)                             1,103,2.5 
                                                          2,101,2.0 
                                                          2,102,2.5 
       1                            5          3          2,103,5.0 
                                                          2,104,2.0 
                                                          3,101,2.5 
                                                          3,104,4.0 
                                                          3,105,4.5 
     101      102      103    104       105   106   107   3,107,5.0 
                                                          4,101,5.0 
                                                          4,103,3.0"
                                                          4,104,4.5"
                                                          4,106,4.0"
                                                          5,101,4.0"
                        2     4                           5,102,3.0"
                                                          5,103,2.0"
                                                          5,104,4.0"
≒    All other preferences are recognized as             5,105,3.5"
      negative  the user doesnt seem to like the        5,106,4.0 "
      item that much (red, dotted)                        	
 
                                                          	
 


A Simple user-based Example

Trend Visualization
Septem
                                                                             ber 1,
                                                                             2012


Users 1 and 5 seem to have similar tastes.                                   14
Both like 101, like 102 a little less, and like 103 less still

       1                            5




     101      102      103    104       105   106   107


                                                Users 1 and 4 seem to
                                                have similar tastes. Both
                        2     4
                                                seem to like 101 and 103
                                                identically
Users 1 and 2 have tastes that seem
to run counter to each other


A Simple user-based Example

Trend Visualization
Septem
                                                                     ber 1,
                                                                     2012



So what product might be recommended to                              15

user 1?
       1                            5          3




     101      102      103    104       105   106   107




                        2     4


 Obviously not 101, 102 or 103. User 1 already knows about these.


A Simple user-based Example

Analyzing the Output
Septem
                                                      ber 1,
                                                      2012


The output could be: [item:104,   value:4.257081]"    16



The recommender engine did so because it
estimated user 1s preference for 104 to be
about 4.3, and that was the highest among all
the items eligible for recommendation.

Questions:
≒ Is this the best recommendation for user 1?
≒ What exactly is a good recommendation?


A Simple user-based Example

Analyzing the Output
CC 2.0 by larsaaboe | http://鍖ic.kr/p/7nJpV8
Septem
                                                                                                  ber 1,
                                                                                                  2012



 Goal:                                                                                            18

          Evaluate how closely the estimated
          preferences match the actual preferences.


 How?
                                                             Produce                   Compare
                                                             estimate                  estimates with
           Reasonable              30% for test
Prepare                    Split                       Run   preferences     Analyse   test data 
           data set	
             70 % for training
                                                             with training             Calculate a
                                                             data                      score


                          Experiment with other recommenders



A Simple user-based Example

Evaluating a Recommender
Septem
                                                                        ber 1,
                                                                        2012



Example evaluation output for a                                         19

particular recommender engine
                              Item 1         Item 2           Item 3
  Actual                      3.0            5.0              4.0
  Estimate                    3.5            2.0              5.0
  Di鍖erence                   0.5            3.0              1.0
  Average distance            = (0.5+3.0+1.0)/3=1.5
  Root-mean-square            =((0.52+3.02+1.02)/3)=1.8484

Note: A score of 0.0 would mean perfect estimation



A Simple user-based Example

Evaluating a Recommender
CC 2.0 by amtrak_russ | http://鍖ic.kr/p/6fAPej
Septem
                                                                 ber 1,
                                                                 2012



≒ Mahout                                                       21

      ≒        Open-source machine learning library from
                Apache (Java)
      ≒        Can be used for large data collections  its
                scalable, build upon Apache Hadoop
      ≒        Implements algorithms such as
                Classi鍖cation, Recommenders, Clustering
      ≒        Incubates a number of techniques and
                algorithms
≒ ML its a hype! But 

In a Nutshell

Apache Mahout
Septem
                                                                                ber 1,
                                                                                2012



A Simple Recommender                                                           22


class RecommenderExample {"
      main(String[] args) throws  {"
       DataModel model = new FileDataModel(new File(examle.csv")); "
       UserSimilarity similarity = "
          new PearsonCorrelationSimilarity(model);"
       UserNeighborhood neighborhood = "
          new NearestNUserNeighborhood(2, similarity, model);"
       Recommender recommender = "
          new GenericUserBasedRecommender(model, neighborhood, similarity);"
       List<RecommendedItem> recommendations = recommender.recommend(1, 1);"
" for (RecommendedItem recommendation : recommendations) {"
           System.out.println(recommendation);"
       }"
}}"
	
 




A Simple user-based Example

Create a Recommender
Septem
                                                                                           ber 1,
                                                                                           2012


                                                                                          23




                                                 <<interface>>	
 
                                                 UserSimilarity	
 

                           <<interface>>	
                           <<interface>>	
 
    ApplicaAon	
 
                           Recommender	
                              DataModel	
 

                                                 <<interface>>	
 
                                               UserNeighborhood	
 




A user-based Recommender

Component Interaction
Septem
                                                                                       ber 1,
                                                                                       2012


NearestNUserNeighborhood                   ThresholdUserNeighborhood                  24


   2	
                                       2	
 


                   1	
                                       1	
 
           5	
                                       5	
 
                                   3	
                                       3	
 
                           4	
                                       4	
 

A neighborhood around user 1
is chosen to consist of the                De鍖ning a neighborhood of
three most similar users: 5, 4,            most-similar users with a
and 2                                      similarity threshold


Algorithms

UserNeighborhood
Septem
                                                                        ber 1,
                                                                        2012



Implementations of this interface de鍖ne a                              25

notion of similarity between two users.
Implementations should return values in the
range -1.0 to 1.0, with 1.0 representing perfect
similarity.
                                <<interface>>
                               UserSimilarity"



   EuclideanDistance         PearsonCorrelation     UncenteredCosine
      Similarity"                Similarity"           Similarity"


             LogLikelihood    TanimotoCoefficient
                                                    ..."
              Similarity"         Similarity"



Algorithms

User Similarity
Septem
                                                               ber 1,
                                                               2012


Similarity between data objects can be represented in         26
a variety of ways:

≒    Distance between data objects is sum of the
      distances of each attribute of the data objects (i.e.
      Euclidean Distance)
≒    Measuring how the attributes of both data objects
      change with respect to the variation of the mean
      value for the attributes (Pearson Correlation
      coe鍖cient)
≒    Using the word frequencies for each document, the
      normalized dot product of the frequencies can be
      used as a measure of similarity (cosine similarity)
≒    An a few more ..


Algorithms

User Similarity
Septem
                                                                          ber 1,
                                                                          2012



Similarity between                                                        27

two data objects:       5




                        4


                                                       User 5   User 1
                        3




                        102
                                          User 2

                        2




                         1


                                          User 3                User 4
                        0
                              0   1   2            3     4        5
                                           101




Mathematically & Plot

Euclidean Distance
Septem
                                                                                           ber 1,
                                                                                           2012



Similarity between                                                                        28

two data objects:
                                  5

                                 4.5

                                  4        104                                  101

                                 3.5

                                  3                                   102




                        User 5
                                 2.5

                                  2                           103

                                 1.5

                                   1

                                 0.5

                                  0
                                       0         1   2            3         4         5
                                                         User 1




Mathematically & Plot

Pearson Correlation
Septem
                                                         ber 1,
                                                         2012


                                                        29




                         Questions?
     Jean-Pierre K旦nig, jean-pierre.koenig@sentric.ch




Namics Conference 2012

Thank you!
Septem
                                                         ber 1,
                                                         2012


≒ References                                           30

     The content of this presentation is based on:
     ≒ Chapter 1, 2 and 4 of the following book:
        Owen, Anil, Dunning, Friedman. Mahout in
        Action. Shelter Island, NY: Manning
        Publications Co., 2012.
     ≒ Chapter Discussion of Similarity Metrics of
        the following publication: Shanley Philip.
        Data Mining Portfolio.
≒ Links
   http://bitly.com/bundles/jpkoenig/1

A Simple user-based Example

Literatur & Links

More Related Content

What are product recommendations, and how do they work?

  • 1. CC 2.0 by Horia Varlan | http://鍖ic.kr/p/7vjmof
  • 2. Septem ber 1, 2012 ≒ What are Product Recommenders 2 ≒ Introducing Recommenders ≒ A Simple Example ≒ Recommender Evaluation ≒ How do they work? ≒ Machine learning tool Apache Mahout Namics Conference 2012 Agenda
  • 3. Septem ber 1, 2012 ≒ Spin-o鍖 of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland ≒ Big Data expert, focused on Hadoop, HBase and Solr ≒ Objective: Transforming data into insights Intro About Sentric
  • 4. CC 2.0 by Dennis Wong | http://鍖ic.kr/p/6C3RuV
  • 5. Septem ber 1, 2012 ≒ Each day we form opinions about 5 things we like, dont like, and dont even care about. ≒ People tend to like things ≒ that similar people like ≒ that are similar to other things they like ≒ These patterns can be used to predict such likes and dislikes. Introducing Recommenders The Patterns
  • 6. Septem ber 1, 2012 user-based Look to what people with 6 similar tastes seem to like Example: Introducing Recommenders Strategies for Discovering New Things
  • 7. Septem ber 1, 2012 item-based Figure out what items are 7 like the ones you already like (again by looking to others apparent preferences) Example: Introducing Recommenders Strategies for Discovering New Things
  • 8. Septem ber 1, 2012 content-based Suggest items based on 8 Septem particular attribute (again by looking to others apparent ber 1, 2012 preferences) Example: Introducing Recommenders Strategies for Discovering New Things
  • 9. Septem ber 1, 2012 9 Collaborative Filtering Item-based Producing recommendations based on, and only based on, knowledge of users User-based Content-based relationships to items. Recommenders Recommendation is all about predicting patterns of taste, and using them to discover new and desirable things you didnt already know about. Introducing Recommenders The De鍖nition of Recommendation
  • 10. CC 2.0 by Will Scullin | http://鍖ic.kr/p/6K9jb8
  • 11. Septem ber 1, 2012 ≒ Lets start with a simple example 11 Create Input Create a Analyse the Data Recommender Output A Simple user-based Example The Work鍖ow
  • 12. Septem ber 1, 2012 ≒ Recommendations will 1,101,5.0 12 1,102,3.0 base on input-data User 1 has a preference 3.0 1,103,2.5 for item 102 2,101,2.0 ≒ Data takes the form of 2,102,2.5 preferences associations 2,103,5.0 2,104,2.0 from users to items 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 Example: 4,101,5.0 4,103,3.0" 4,104,4.5" These values might be ratings 4,106,4.0" on a scale of 1 to 5, where 1 5,101,4.0" 5,102,3.0" indicates items the user cant 5,103,2.0" 5,104,4.0" stand, and 5 indicates 5,105,3.5" favorites. 5,106,4.0 " A Simple user-based Example Input Data
  • 13. Septem ber 1, 2012 ≒ Trend visualization for positive users 1,101,5.0 13 1,102,3.0 preferences (in petrol) 1,103,2.5 2,101,2.0 2,102,2.5 1 5 3 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 101 102 103 104 105 106 107 3,107,5.0 4,101,5.0 4,103,3.0" 4,104,4.5" 4,106,4.0" 5,101,4.0" 2 4 5,102,3.0" 5,103,2.0" 5,104,4.0" ≒ All other preferences are recognized as 5,105,3.5" negative the user doesnt seem to like the 5,106,4.0 " item that much (red, dotted) A Simple user-based Example Trend Visualization
  • 14. Septem ber 1, 2012 Users 1 and 5 seem to have similar tastes. 14 Both like 101, like 102 a little less, and like 103 less still 1 5 101 102 103 104 105 106 107 Users 1 and 4 seem to have similar tastes. Both 2 4 seem to like 101 and 103 identically Users 1 and 2 have tastes that seem to run counter to each other A Simple user-based Example Trend Visualization
  • 15. Septem ber 1, 2012 So what product might be recommended to 15 user 1? 1 5 3 101 102 103 104 105 106 107 2 4 Obviously not 101, 102 or 103. User 1 already knows about these. A Simple user-based Example Analyzing the Output
  • 16. Septem ber 1, 2012 The output could be: [item:104, value:4.257081]" 16 The recommender engine did so because it estimated user 1s preference for 104 to be about 4.3, and that was the highest among all the items eligible for recommendation. Questions: ≒ Is this the best recommendation for user 1? ≒ What exactly is a good recommendation? A Simple user-based Example Analyzing the Output
  • 17. CC 2.0 by larsaaboe | http://鍖ic.kr/p/7nJpV8
  • 18. Septem ber 1, 2012 Goal: 18 Evaluate how closely the estimated preferences match the actual preferences. How? Produce Compare estimate estimates with Reasonable 30% for test Prepare Split Run preferences Analyse test data data set 70 % for training with training Calculate a data score Experiment with other recommenders A Simple user-based Example Evaluating a Recommender
  • 19. Septem ber 1, 2012 Example evaluation output for a 19 particular recommender engine Item 1 Item 2 Item 3 Actual 3.0 5.0 4.0 Estimate 3.5 2.0 5.0 Di鍖erence 0.5 3.0 1.0 Average distance = (0.5+3.0+1.0)/3=1.5 Root-mean-square =((0.52+3.02+1.02)/3)=1.8484 Note: A score of 0.0 would mean perfect estimation A Simple user-based Example Evaluating a Recommender
  • 20. CC 2.0 by amtrak_russ | http://鍖ic.kr/p/6fAPej
  • 21. Septem ber 1, 2012 ≒ Mahout 21 ≒ Open-source machine learning library from Apache (Java) ≒ Can be used for large data collections its scalable, build upon Apache Hadoop ≒ Implements algorithms such as Classi鍖cation, Recommenders, Clustering ≒ Incubates a number of techniques and algorithms ≒ ML its a hype! But In a Nutshell Apache Mahout
  • 22. Septem ber 1, 2012 A Simple Recommender 22 class RecommenderExample {" main(String[] args) throws {" DataModel model = new FileDataModel(new File(examle.csv")); " UserSimilarity similarity = " new PearsonCorrelationSimilarity(model);" UserNeighborhood neighborhood = " new NearestNUserNeighborhood(2, similarity, model);" Recommender recommender = " new GenericUserBasedRecommender(model, neighborhood, similarity);" List<RecommendedItem> recommendations = recommender.recommend(1, 1);" " for (RecommendedItem recommendation : recommendations) {" System.out.println(recommendation);" }" }}" A Simple user-based Example Create a Recommender
  • 23. Septem ber 1, 2012 23 <<interface>> UserSimilarity <<interface>> <<interface>> ApplicaAon Recommender DataModel <<interface>> UserNeighborhood A user-based Recommender Component Interaction
  • 24. Septem ber 1, 2012 NearestNUserNeighborhood ThresholdUserNeighborhood 24 2 2 1 1 5 5 3 3 4 4 A neighborhood around user 1 is chosen to consist of the De鍖ning a neighborhood of three most similar users: 5, 4, most-similar users with a and 2 similarity threshold Algorithms UserNeighborhood
  • 25. Septem ber 1, 2012 Implementations of this interface de鍖ne a 25 notion of similarity between two users. Implementations should return values in the range -1.0 to 1.0, with 1.0 representing perfect similarity. <<interface>> UserSimilarity" EuclideanDistance PearsonCorrelation UncenteredCosine Similarity" Similarity" Similarity" LogLikelihood TanimotoCoefficient ..." Similarity" Similarity" Algorithms User Similarity
  • 26. Septem ber 1, 2012 Similarity between data objects can be represented in 26 a variety of ways: ≒ Distance between data objects is sum of the distances of each attribute of the data objects (i.e. Euclidean Distance) ≒ Measuring how the attributes of both data objects change with respect to the variation of the mean value for the attributes (Pearson Correlation coe鍖cient) ≒ Using the word frequencies for each document, the normalized dot product of the frequencies can be used as a measure of similarity (cosine similarity) ≒ An a few more .. Algorithms User Similarity
  • 27. Septem ber 1, 2012 Similarity between 27 two data objects: 5 4 User 5 User 1 3 102 User 2 2 1 User 3 User 4 0 0 1 2 3 4 5 101 Mathematically & Plot Euclidean Distance
  • 28. Septem ber 1, 2012 Similarity between 28 two data objects: 5 4.5 4 104 101 3.5 3 102 User 5 2.5 2 103 1.5 1 0.5 0 0 1 2 3 4 5 User 1 Mathematically & Plot Pearson Correlation
  • 29. Septem ber 1, 2012 29 Questions? Jean-Pierre K旦nig, jean-pierre.koenig@sentric.ch Namics Conference 2012 Thank you!
  • 30. Septem ber 1, 2012 ≒ References 30 The content of this presentation is based on: ≒ Chapter 1, 2 and 4 of the following book: Owen, Anil, Dunning, Friedman. Mahout in Action. Shelter Island, NY: Manning Publications Co., 2012. ≒ Chapter Discussion of Similarity Metrics of the following publication: Shanley Philip. Data Mining Portfolio. ≒ Links http://bitly.com/bundles/jpkoenig/1 A Simple user-based Example Literatur & Links