Generate personalized location recommendation to user using KNN and collaborative filtering . We have used " Foursquare NYC Check-in Dataset" .
Link : https://sites.google.com/site/yangdingqi/home/foursquare-dataset
1. Presented By
Md. Farhan Tanvir(2014-2-60-124)
Kevin Stephen Bishwas (2014-2-60-091)
Nazmul Hasan(2014-2-60-063)
Supervised By
Dr. Mohammad Rezwanul Huq
Assistant Professor
Department Of Computer Science And Engineering
East West University .
Clustering-based Location
Recommendation System
1
6. Can Google Help ?
Yes, but only when we really know what
we are looking for
What if I just want some interesting place to
visit?
Btw, what does it mean by interesting?
6
7. Can Facebook Help ?
Yes, I tend to find my friends stuffs
interesting
What if I had only few friends, and what places
they visit do not always attract me?
7
8. Can experts help?
Yes, but it wont scale well
Everyone receives exactly the same advice!
It is what they like, not me!
Like restaurant , what get expert approval does
not guarantee attention of the mass .
8
9. OK, Here is the idea called Recommendation System
Recommendation system is an information filtering technique,
which provides users with information, which user may be
interested in .
Based on
- Past Behavior
- Relations to the user
- Item Similarity
- Context
9
10. Existing Work
Ling Li*, Ya Zhou, Han Xiong, Cailin Hu, Collaborative filtering based on user attributes
and user ratings for restaurant recommendation , 2017 IEEE 2nd Advanced Information
Technology, Electronic and Automation Control Conference (IAEAC) .
Zhiyang Jia , Wei Gao , Yuting Yang , Xu Chen , User-based Collaborative Filtering for
Tourist Attraction Recommendations , 2015 IEEE International Conference on
Computational Intelligence & Communication Technology.
Lakshmi Tharun Ponnam (Author) , Sreenivasa Deepak Punyasamudram ,Siva Nagaraju
Nallagulla , Srikanth Yellamati , Movie Recommender System Using Item Based
Collaborative Filtering Technique , 2016 International Conference on Emerging Trends
in Engineering, Technology and Science (ICETETS) .
10
13. Attributes of our Dataset
13
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Time zone offset
8 UTC time
But after Data cleaning and
feature engineering weve
got some other attribute .
What Data Cleaning and
Feature Engineering ?
14. Task 1: Data Cleaning
Removing Home Check-Ins:
-The dataset did not contain the home check-ins for all the users .
After cleaning with certain process we removed this.
14
15. Task 1: Data Cleaning(Cont)
Replacing Multiple category of a venue:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C002 Park
Figure : Before Replacing
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
Figure : After Replacing
15
16. Task 1: Data Cleaning(Cont)
Replacing Sub-Category Ids From Category Id Column:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-2 C002 Bar
1 V-3 C001 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : Before Replacing
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C002 Bar
1 V-2 C002 Bar
1 V-3 C002 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : After Replacing
16
17. Task 1: Data Cleaning(Cont)
Replacing different latitude and longitude value of a venue:
Figure : Before Replacing Figure : After Replacing
Venue Id Latitude Longitude
V-1 40 -73
V-1 43 -70
V-1 43 -70
V-1 40 -73
V-1 40 -73
17
Venue Id Latitude Longitude
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
18. Task 2: Feature Engineering
Check-In Counts:
User Id Venue Id Check-In Count
1083 V-1 3
1083 V-2 1
1083 V-3 1
1083 V-4 2
1083 V-5 1
Figure : After adding Check-In Count attribute
18
19. Task 2: Feature Engineering(Cont)
Venue Distance from Users Center:
- First We find out users center point by doing average of latitude and
longitude where user has previously checked .
-Now, Using this center points we calculate the distance of each ven using The Haversine
Formula.
Where,
d is the distance between the two points,
r is the radius of the sphere,
1, 2: latitude of point 1 and latitude of point 2, in radians
了1, 了2: longitude of point 1 and longitude of point 2, in radians
Reference : https://www.movable-type.co.uk/scripts/latlong.html 19
= р
+
20. Our Dataset After Feature Engineering
20
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Distance From Center
8 Check In Count
21. Task 2 : Clustering
We used KNN (k-nearest neighbors) as clustering algorithm .
First we find the similarity between user using Pearson correlation . We also checked cosine
correlation but Pearson Correlation gives us better result .
Where:-
Rui, Rvi represent the checkingCount of ith item given by the user
u and v respectively.
Ru , Rv represent the average checkin of user u and v respectively.
Iuv donates the set of items checked by both user u and v
, =
.
21
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
22. Task 2 : Clustering(Cont)
After finding similarity we take top n nearest neighbor .
Then used their checkinCount to find predicted checkinCount for
every places of that user which user didnt check in . We used
weighted average checkin to predict checkin count for a user .
After this we took top most checkInCount.
22
23. Task 3 : Find User Preference
We used users every check-ins distance from center point and find a mean
distance. If users most of the checkins distance are more than mean distance
we can say user like to travel in long distance otherwise like to travel in close
distance . Then we sort the recommendation on user preference .
Example :
Users mean checkin distance = 50 KM
Users have 50 checkins .
30 of them are more than 50 km.
Result : Users Love o travel in long distance
23
24. Example
Place1 Place2 Place3 Place4
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
What will be probable checking count of Place4? 24
27. Evolution
We used Sampling and RMSE technique for evaluating our recommendation.
In sampling technique 10% of the entire dataset was selected randomly without replacement
to make a sample dataset.
RMSE technique was used to evaluate the algorithm. It calculated the error of a predicted
check in count from an actual check in count of a venue by specific user in test dataset.
RMSE Formula:
RMSE =
=1
, ,
2
Here :
P u,i=is the predicted checkIn Count for user u on venue i
R u,i=is the actual checkIn Count for user u on venue i
N=is the total number of venues where user checked in
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
27
29. Demo
We have created a simple demo where user can enter their id and our system will
recommend place for user .
Figure : Input User Id Figure : Output Recommendation
29
30. Future Work
30
Try Model Based Recommendation System
Add More Domain
Try Triangulation Technique to find users center point .