This document provides a high-level overview of how recommendation engines work in 3 steps:
1. It collects ratings and preferences from users on various items like movies or games to build a dataset.
2. It compares the preferences of different users to calculate similarity scores between users based on how closely their preferences align.
3. It uses the ratings of similar users to recommend new items to a given user, with ratings from more similar users weighted more heavily. Additional techniques like finding similar items are also discussed.
3. ARTIFICIAL INTELLIGENCE
"the design of systems that
perceive their environment and
take actions to maximize its
chances of success"
-John McCarthy (1956)
4. ARTIFICIAL INTELLIGENCE
End goal: achieve superhuman
intelligence in a machine
The "golden dream" of the 80's
The holy grail of the Singularitarians
7. BUT WHAT SHOULD I USE AI FOR...?
Pick the most cost-effective combination of flights
Genetic algorithms
between NYC and Tashkent*
Natural Language
"Siri, call me sexy!" (iPhone)
Processing
Machine Vision Unlock your Android phone just by looking at it
Neural networks "automatically detect and flag NSFW pictures" (Flickr)
"I got a high fever and my knees hurt. Am I
Classifiers
dying?" (WebMD)
* thats in uzbekistan, if youre asking
8. But thats all SO COMPLICATED
Isnt there anything a bit more
practical?
- some pragmatic programmer
9. ARTIFICIAL INTELLIGENCE
Basic techniques
Far from intelligent, still very useful
classify blogs in categories based
Data clustering
on their content (Flipboard)
Recommendation people who watched The X-Files
engines also liked...
11. IN A NUTSHELL...
Collect ratings/preferences from users (eg.:
reviews, likes)
Compare people to other people based on what
they like
Offer stuff similar people already liked
12. STEP 1: BUILD A DATASET
Boil preferences down to numbers*
User liked a game = 0.5 point
User purchased a game = 1 point
User played a game = 2 points
User reviewed a game = 3 points
User liked, purchased, played and
reviewed a game = 6.5 points
* Resist the urge to use 1-5 star ratings
13. STEP 1: BUILD A DATASET
[
Entity.new('Lisa', {
'Prince of Persia' => 2.5, 'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Rise of the Triad' => 3.5, 'Commander Keen' =>
2.5, 'Duke Nukem' => 3.0
}),
Entity.new('Larry', {
'Prince of Persia' => 3.0, 'Doom' => 3.5, 'Castle Wolfenstein' => 1.5, 'Rise of the Triad' => 5.0, 'Duke Nukem' => 3.0,
'Commander Keen' => 3.5
}),
Entity.new('Robert', {
'Prince of Persia' => 2.5, 'Doom' => 3.0, 'Rise of the Triad' => 3.5, 'Duke Nukem' => 4.0
}),
Entity.new('Claudia', {
'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Duke Nukem' => 4.5, 'Rise of the Triad' => 4.0, 'Commander Keen' => 2.5
}),
Entity.new('Mark', {
'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Castle Wolfenstein' => 2.0, 'Rise of the Triad' => 3.0, 'Duke Nukem' => 3.0,
'Commander Keen' => 2.0
}),
Entity.new('Jane', {
'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Duke Nukem' => 3.0, 'Rise of the Triad' => 5.0, 'Commander Keen' => 3.5
}),
Entity.new('John', {
'Doom' => 4.5, 'Commander Keen' => 1.0, 'Rise of the Triad' => 4.0
})
]
14. STEP 2: COMPARE PEOPLE
Compare each person to one another, generating a
similarity score
Euclidian distance between each pair of ratings:
Many other distance calculation algorithms exist: linear distance, jaccard,
manhattan, tanimoto, etc
Better algorithm = more interesting results
15. STEP 2: COMPARE PEOPLE
# Returns the euclidian distance between person1 and person2
def distance(person1, person2)
rated_by_both = person1.ratings.select { |game| person2.ratings[game] }
# if they have no ratings in common, return 0
return 0.0 if rated_by_both.empty?
# add up the squares of all the differences
sum_of_squares = 0.0
person1.ratings.collect do |game, score|
person2_score = person2.ratings[game]
next if !person2_score
sum_of_squares += ((score - person2_score) ** 2)
end
1.0 / (1.0 + sum_of_squares)
end
16. STEP 3: SIMILAR USERS
Grab the n users with the
highest level of similarity
(in other words, the people closest to you,
according to the distance algorithm from step 2)
17. STEP 3: SIMILAR USERS
# Returns the 5 best matching people (most similar preferences)
def top_matches(person, all_ratings)
other_people = all_ratings.select { |person2| person2.name != person.name }
other_people.collect do |other_person|
[
other_person,
distance(person, other_person) # change this to use other algorithms
]
end.sort_by { |sim| sim[1] }.reverse[0..5]
end
# People similar to John:
# Mark (30% match)
# Robert (28% match)
# Claudia (23% match)
# Lisa (22% match)
# Jane (11% match)
# Larry (10% match)
18. STEP 3: SIMILAR USERS
# Returns the 5 best matching people (most similar preferences)
def top_matches(person, all_ratings)
other_people = all_ratings.select { |person2| person2.name != person.name }
other_people.collect do |other_person|
[
other_person,
distance(person, other_person) # change this to use other algorithms
]
end.sort_by { |sim| sim[1] }.reverse[0..5]
end
# People similar to John:
# Mark (30% match)
# Robert (28% match)
!
# Claudia (23% match)
Achievement unlocked
# Lisa (22% match)
# Jane (11% match)
# Larry (10% match)
people you should follow: John, Mary
19. STEP 4: RECOMMENDING A GAME
Grab each users ratings to games you
havent rated
Multiply that by how similar the other
user is to you(opinions from people similar to you weight more)
Grab the highest numbers
20. STEP 4: RECOMMENDING A GAME
# Gets recommendations for a person by using a weighted average of every other user's ratings
def recommendations(person, other_people)
similarities = {}
other_people.each do |other_person|
similarity = distance(person, other_person)
# ignore scores of zero or lower
next if similarity <= 0
other_person.ratings.each do |other_person_game, other_person_score|
# only score what I haven't rated yet
next if person.ratings[other_person_game]
similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 }
# sum of weighted rating times similarity and total similarity
similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity
similarity_for_game[:sum] += similarity
end
end
# normalize list and sort by highest scores first # Recommended games for John:
similarities.collect do |game_name, score| # Duke Nukem
[ game_name, (score[:weighted] / score[:sum]) ]
end.sort_by { |sim| sim[1] }.reverse # Prince of Persia
end # Castle Wolfenstein
21. STEP 4: RECOMMENDING A GAME
# Gets recommendations for a person by using a weighted average of every other user's ratings
def recommendations(person, other_people)
similarities = {}
other_people.each do |other_person|
similarity = distance(person, other_person)
# ignore scores of zero or lower
next if similarity <= 0
other_person.ratings.each do |other_person_game, other_person_score|
# only score what I haven't rated yet
next if person.ratings[other_person_game]
similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 }
# sum of weighted rating times similarity and total similarity
similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity
similarity_for_game[:sum] += similarity
end
end
! Achievement unlocked # Recommended games for John:
# normalize list and sort by highest scores first
similarities.collect do |game_name, score| # Duke Nukem
[ game_name, (score[:weighted] / score[:sum]) ]
end.sort_by { |sim| sim[1] }.reverse games recommended to John: #Pac Man,Persia 3
Prince of Doom
end # Castle Wolfenstein
22. STEP 5: SIMILAR GAMES
Invert users x preferences, then use the exact same algorithm as
step 1 to find similar games based solely on peoples
interactions (item-based filtering).
# User has many ratings
"Larry" => { "Prince of Persia" => 3.0, "Doom" => 3.5, "Castle Wolfenstein" => 1.5 },
"Robert" => { "Prince of Persia" => 2.5, "Doom" => 3.0 },
"Jane" => { "Prince of Persia" => 3.0, "Doom" => 4.0 },
"John" => { "Doom" => 4.5 }
# Game rated by many users
"Prince of Persia" => { "Larry"=>3.0, "Robert"=>2.5, "Jane"=>3.0 },
"Doom" => { "Larry"=>3.5, "Robert"=>3.0, "Jane"=>4.0, "John"=>4.5 },
"Castle Wolfenstein" => { "Larry"=>1.5, "Mark"=>2.0 }
Cross-compare everything. This might take a very long time for a
large number of games
Hint: save this data on a persistent storage will lead to very fast recommendation lookups
(thats what most recommendation engines save, in fact)
23. STEP 5: SIMILAR GAMES
# Create a dictionary of games showing which other games they
# are most similar to. This should be run often and cached for reuse
def calculate_similar_games(game_ratings)
Hash[game_ratings.collect do |game|
[
game.name,
top_matches(game, game_ratings)
]
end]
end
# Similar games:
# Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%)
# Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%)
# Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%)
# Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%)
# Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%)
# Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
24. STEP 5: SIMILAR GAMES
# Create a dictionary of games showing which other games they
# are most similar to. This should be run often and cached for reuse
def calculate_similar_games(game_ratings)
Hash[game_ratings.collect do |game|
[
game.name,
top_matches(game, game_ratings)
]
end]
end
!
# Similar games:
Achievement unlocked
# Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%)
# Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%)
# Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%)
# Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%)
# Doom is similar to Daikatana and Quake
Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%)
# Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
25. BONUS STAGE: FASTER RECOMMENDATIONS
(if youre still with me)
A slightly tweaked version of the algorithm
on step 2: just use the pre-calculated
similarities instead of doing distances in
the loop
Up to 10x faster in a pure Ruby
implementation
26. BONUS STAGE: FASTER RECOMMENDATIONS
(if youre still with me)
# this is very similar to the recommendations() algorithm,
# except we use a pre-calculated similar_games_matrix instead of
# calculating distances here
def recommended_games(similar_games_matrix, user)
similarities = {}
user.ratings.each do |game_name, user_rating|
# Loop over pre-cached game similarities to the current game
similar_games_matrix[game_name].each do |game, similarity|
# Ignore if this user has already rated this similar game
next if user.ratings[game.name]
score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 }
# Weighted sum of rating times similarity and sum of similarities
score_for_game[:weighted] += similarity * user_rating
score_for_game[:sum] += similarity
end
end
# Divide each total score by total weighting to get an average
# Return the rankings from highest to lowest
similarities.collect do |game_name, score|
[ game_name, (score[:weighted] / score[:sum]) ]
end.sort_by { |sim| sim[1] }.reverse
end
27. BONUS STAGE: FASTER RECOMMENDATIONS
(if youre still with me)
# this is very similar to the recommendations() algorithm,
# except we use a pre-calculated similar_games_matrix instead of
# calculating distances here
def recommended_games(similar_games_matrix, user)
similarities = {}
user.ratings.each do |game_name, user_rating|
# Loop over pre-cached game similarities to the current game
similar_games_matrix[game_name].each do |game, similarity|
# Ignore if this user has already rated this similar game
next if user.ratings[game.name]
score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 }
# Weighted sum of rating times similarity and sum of similarities
score_for_game[:weighted] += similarity * user_rating
score_for_game[:sum] += similarity
end
end
# Divide each total score by total weighting to get an average
!
# Return the rankings from highest to lowest
similarities.collect do |game_name, score|
end.sort_by { |sim| sim[1] }.reverse
end
Achievement unlocked
[ game_name, (score[:weighted] / score[:sum]) ]
EPIC WIN!
29. Thats awesome and stuff, but...
do these come in little boxes?
- that pragmatic programmer
30. Yes, we have RubyGems
top choice according to
recommendify
RubyToolbox
recommendable Rails-compatible
doesnt require Redis (not
acts_as_recommended
actively maintained)
31. RECOMMENDIFY EXAMPLE
class GameRecommender < Recommendify::Base
# store only the top 10 neighbors per item
max_neighbors 10
# define an input data set "game_ratings". we'll add "user_id->game_id"
# pairs to this input and use the jaccard coefficient to retrieve a
# "users that liked game i1 also liked game i2" list
input_matrix :game_ratings,
:similarity_func => :jaccard,
:weight => 5.0
end
recommender = GameRecommender.new
# add `order_id->product_id` interactions to the order_item_sim input
# you can add data incrementally and call RecommendedItem.process! to update
# the similarity matrix at any time.
recommender.game_ratings.add_set("John", ["Duke Nukem", "Doom", "Quake"])
recommender.game_ratings.add_set("Mary", ["Prince of Persia", "Doom"])
# Calculate all elements of the similarity matrix
recommender.process!
# retrieve similar games to "Doom"
recommender.for("Doom")
=> [ <Recommendify::Neighbor item_id:"Duke Nukem" similarity:0.23>, (...) ]
32. RECOMMENDABLE EXAMPLE
class User < ActiveRecord::Base
recommends :books, :movies, :games
end
>> friend.like(Movie.where(:name => "2001: A Space Odyssey").first)
>> friend.like(Book.where(:title => "A Clockwork Orange").first)
>> friend.like(Book.where(:title => "Brave New World").first)
>> friend.like(Book.where(:title => "One Flew Over the Cuckoo's Next").first)
>> user.like(Book.where(:title => "A Clockwork Orange").first)
>> user.recommended_books
=> [#<Book title: "Brave New World">, #<Book title: "One Flew Over the Cuckoo's
Nest">]
>> user.recommended_movies
=> [#<Movie name: "A Clockwork Orange">]
33. CLOSING REMARKS
Gems are cool, but youll have to dive into
the code for better results
Crossing social filtering with other AI
techniques (e.g.: content classification)
produces dramatically better results
34. ZOMG I NEED TO KNOW MORE
Code from this presentation: https://gist.github.com/herval/4992503
Stuff to read:
AI Application Programming: http://ai-app-prog.rubyforge.org
Programming Collective Intelligence: http://amzn.to/XtANMl
(great book for noobs)
Ready-to-use Gems
https://github.com/paulasmuth/recommendify
https://github.com/davidcelis/recommendable
https://github.com/Draiken/acts_as_recommended
Serious AI algorithms in Ruby
https://github.com/SergioFierens/ai4r
http://web.media.mit.edu/~dustin/papers/ai_ruby_plugins/
https://github.com/kanwei/algorithms