際際滷

際際滷Share a Scribd company logo
Presented By
Md. Farhan Tanvir(2014-2-60-124)
Kevin Stephen Bishwas (2014-2-60-091)
Nazmul Hasan(2014-2-60-063)
Supervised By
Dr. Mohammad Rezwanul Huq
Assistant Professor
Department Of Computer Science And Engineering
East West University .
Clustering-based Location
Recommendation System
1
The world is an over-crowded place
2
They all want to get our attention
3
We are overloaded
 Thousands of news places to visit
 Millions of restaurants , hotels ,
parks to visit .
4
5
Can Google Help ?
 Yes, but only when we really know what
we are looking for
 What if I just want some interesting place to
visit?
 Btw, what does it mean by interesting?
6
Can Facebook Help ?
 Yes, I tend to find my friends stuffs
interesting
 What if I had only few friends, and what places
they visit do not always attract me?
7
Can experts help?
 Yes, but it wont scale well
 Everyone receives exactly the same advice!
 It is what they like, not me!
 Like restaurant , what get expert approval does
not guarantee attention of the mass .
8
OK, Here is the idea called Recommendation System
 Recommendation system is an information filtering technique,
which provides users with information, which user may be
interested in .
 Based on
- Past Behavior
- Relations to the user
- Item Similarity
- Context
9
Existing Work
 Ling Li*, Ya Zhou, Han Xiong, Cailin Hu, Collaborative filtering based on user attributes
and user ratings for restaurant recommendation , 2017 IEEE 2nd Advanced Information
Technology, Electronic and Automation Control Conference (IAEAC) .
 Zhiyang Jia , Wei Gao , Yuting Yang , Xu Chen , User-based Collaborative Filtering for
Tourist Attraction Recommendations , 2015 IEEE International Conference on
Computational Intelligence & Communication Technology.
 Lakshmi Tharun Ponnam (Author) , Sreenivasa Deepak Punyasamudram ,Siva Nagaraju
Nallagulla , Srikanth Yellamati , Movie Recommender System Using Item Based
Collaborative Filtering Technique , 2016 International Conference on Emerging Trends
in Engineering, Technology and Science (ICETETS) .
10
Our Proposal
Input
Dataset
Data
Cleaning
Feature
Engineering Clustering
Find User
Preference
Result
11
Our Dataset
 Foursquare NYC Check-in Dataset
 https://sites.google.com/site/yangdingqi/home/foursquare-dataset
12
Attributes of our Dataset
13
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Time zone offset
8 UTC time
But after Data cleaning and
feature engineering weve
got some other attribute .
What Data Cleaning and
Feature Engineering ?
Task 1: Data Cleaning
 Removing Home Check-Ins:
-The dataset did not contain the home check-ins for all the users .
After cleaning with certain process we removed this.
14
Task 1: Data Cleaning(Cont)
 Replacing Multiple category of a venue:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C002 Park
Figure : Before Replacing
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
Figure : After Replacing
15
Task 1: Data Cleaning(Cont)
 Replacing Sub-Category Ids From Category Id Column:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-2 C002 Bar
1 V-3 C001 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : Before Replacing
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C002 Bar
1 V-2 C002 Bar
1 V-3 C002 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : After Replacing
16
Task 1: Data Cleaning(Cont)
 Replacing different latitude and longitude value of a venue:
Figure : Before Replacing Figure : After Replacing
Venue Id Latitude Longitude
V-1 40 -73
V-1 43 -70
V-1 43 -70
V-1 40 -73
V-1 40 -73
17
Venue Id Latitude Longitude
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
Task 2: Feature Engineering
 Check-In Counts:
User Id Venue Id Check-In Count
1083 V-1 3
1083 V-2 1
1083 V-3 1
1083 V-4 2
1083 V-5 1
Figure : After adding Check-In Count attribute
18
Task 2: Feature Engineering(Cont)
 Venue Distance from Users Center:
- First We find out users center point by doing average of latitude and
longitude where user has previously checked .
-Now, Using this center points we calculate the distance of each ven using The Haversine
Formula.
Where,
 d is the distance between the two points,
 r is the radius of the sphere,
 1, 2: latitude of point 1 and latitude of point 2, in radians
 了1, 了2: longitude of point 1 and longitude of point 2, in radians
Reference : https://www.movable-type.co.uk/scripts/latlong.html 19
 =  р
 
    

+
Our Dataset After Feature Engineering
20
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Distance From Center
8 Check In Count
Task 2 : Clustering
 We used KNN (k-nearest neighbors) as clustering algorithm .
 First we find the similarity between user using Pearson correlation . We also checked cosine
correlation but Pearson Correlation gives us better result .
Where:-
 Rui, Rvi represent the checkingCount of ith item given by the user
u and v respectively.
 Ru , Rv represent the average checkin of user u and v respectively.
 Iuv donates the set of items checked by both user u and v
 ,  =
 
     .     
 
    

 
    

21
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
Task 2 : Clustering(Cont)
 After finding similarity we take top n nearest neighbor .
 Then used their checkinCount to find predicted checkinCount for
every places of that user which user didnt check in . We used
weighted average checkin to predict checkin count for a user .
 After this we took top most checkInCount.
22
Task 3 : Find User Preference
 We used users every check-ins distance from center point and find a mean
distance. If users most of the checkins distance are more than mean distance
we can say user like to travel in long distance otherwise like to travel in close
distance . Then we sort the recommendation on user preference .
 Example :
Users mean checkin distance = 50 KM
Users have 50 checkins .
30 of them are more than 50 km.
Result : Users Love o travel in long distance
23
Example
Place1 Place2 Place3 Place4
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
What will be probable checking count of Place4? 24
Example(Cont..)
Place1 Place2 Place3 Place4
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
25
Example(Cont..)
Place1 Place2 Place3 Place4
Me 3 - 5 6
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
26
Evolution
 We used Sampling and RMSE technique for evaluating our recommendation.
 In sampling technique 10% of the entire dataset was selected randomly without replacement
to make a sample dataset.
 RMSE technique was used to evaluate the algorithm. It calculated the error of a predicted
check in count from an actual check in count of a venue by specific user in test dataset.
RMSE Formula:
RMSE =
=1
  , ,
2

Here :
P u,i=is the predicted checkIn Count for user u on venue i
R u,i=is the actual checkIn Count for user u on venue i
N=is the total number of venues where user checked in
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
27
RMSE Graph
28
Figure : RMSE graph
Demo
 We have created a simple demo where user can enter their id and our system will
recommend place for user .
Figure : Input User Id Figure : Output Recommendation
29
Future Work
30
 Try Model Based Recommendation System
 Add More Domain
 Try Triangulation Technique to find users center point .
31

More Related Content

Clustering-based Location Recommendation(Collaborative Filtering)

  • 1. Presented By Md. Farhan Tanvir(2014-2-60-124) Kevin Stephen Bishwas (2014-2-60-091) Nazmul Hasan(2014-2-60-063) Supervised By Dr. Mohammad Rezwanul Huq Assistant Professor Department Of Computer Science And Engineering East West University . Clustering-based Location Recommendation System 1
  • 2. The world is an over-crowded place 2
  • 3. They all want to get our attention 3
  • 4. We are overloaded Thousands of news places to visit Millions of restaurants , hotels , parks to visit . 4
  • 5. 5
  • 6. Can Google Help ? Yes, but only when we really know what we are looking for What if I just want some interesting place to visit? Btw, what does it mean by interesting? 6
  • 7. Can Facebook Help ? Yes, I tend to find my friends stuffs interesting What if I had only few friends, and what places they visit do not always attract me? 7
  • 8. Can experts help? Yes, but it wont scale well Everyone receives exactly the same advice! It is what they like, not me! Like restaurant , what get expert approval does not guarantee attention of the mass . 8
  • 9. OK, Here is the idea called Recommendation System Recommendation system is an information filtering technique, which provides users with information, which user may be interested in . Based on - Past Behavior - Relations to the user - Item Similarity - Context 9
  • 10. Existing Work Ling Li*, Ya Zhou, Han Xiong, Cailin Hu, Collaborative filtering based on user attributes and user ratings for restaurant recommendation , 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) . Zhiyang Jia , Wei Gao , Yuting Yang , Xu Chen , User-based Collaborative Filtering for Tourist Attraction Recommendations , 2015 IEEE International Conference on Computational Intelligence & Communication Technology. Lakshmi Tharun Ponnam (Author) , Sreenivasa Deepak Punyasamudram ,Siva Nagaraju Nallagulla , Srikanth Yellamati , Movie Recommender System Using Item Based Collaborative Filtering Technique , 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS) . 10
  • 12. Our Dataset Foursquare NYC Check-in Dataset https://sites.google.com/site/yangdingqi/home/foursquare-dataset 12
  • 13. Attributes of our Dataset 13 1 User ID 2 Venue ID 3 Venue Category ID 4 Venue Category 5 Latitude 6 Longitude 7 Time zone offset 8 UTC time But after Data cleaning and feature engineering weve got some other attribute . What Data Cleaning and Feature Engineering ?
  • 14. Task 1: Data Cleaning Removing Home Check-Ins: -The dataset did not contain the home check-ins for all the users . After cleaning with certain process we removed this. 14
  • 15. Task 1: Data Cleaning(Cont) Replacing Multiple category of a venue: User Id Venue Id Venue Category Id Venue Category 1 V-1 C001 Bar 1 V-1 C002 Bar 1 V-1 C001 Bar 1 V-1 C002 Bar 1 V-1 C002 Park Figure : Before Replacing User Id Venue Id Venue Category Id Venue Category 1 V-1 C002 Bar 1 V-1 C002 Bar 1 V-1 C002 Bar 1 V-1 C002 Bar 1 V-1 C002 Bar Figure : After Replacing 15
  • 16. Task 1: Data Cleaning(Cont) Replacing Sub-Category Ids From Category Id Column: User Id Venue Id Venue Category Id Venue Category 1 V-1 C001 Bar 1 V-2 C002 Bar 1 V-3 C001 Bar 1 V-4 C002 Bar 1 V-5 C002 Bar Figure : Before Replacing User Id Venue Id Venue Category Id Venue Category 1 V-1 C002 Bar 1 V-2 C002 Bar 1 V-3 C002 Bar 1 V-4 C002 Bar 1 V-5 C002 Bar Figure : After Replacing 16
  • 17. Task 1: Data Cleaning(Cont) Replacing different latitude and longitude value of a venue: Figure : Before Replacing Figure : After Replacing Venue Id Latitude Longitude V-1 40 -73 V-1 43 -70 V-1 43 -70 V-1 40 -73 V-1 40 -73 17 Venue Id Latitude Longitude V-1 40 -73 V-1 40 -73 V-1 40 -73 V-1 40 -73 V-1 40 -73
  • 18. Task 2: Feature Engineering Check-In Counts: User Id Venue Id Check-In Count 1083 V-1 3 1083 V-2 1 1083 V-3 1 1083 V-4 2 1083 V-5 1 Figure : After adding Check-In Count attribute 18
  • 19. Task 2: Feature Engineering(Cont) Venue Distance from Users Center: - First We find out users center point by doing average of latitude and longitude where user has previously checked . -Now, Using this center points we calculate the distance of each ven using The Haversine Formula. Where, d is the distance between the two points, r is the radius of the sphere, 1, 2: latitude of point 1 and latitude of point 2, in radians 了1, 了2: longitude of point 1 and longitude of point 2, in radians Reference : https://www.movable-type.co.uk/scripts/latlong.html 19 = р +
  • 20. Our Dataset After Feature Engineering 20 1 User ID 2 Venue ID 3 Venue Category ID 4 Venue Category 5 Latitude 6 Longitude 7 Distance From Center 8 Check In Count
  • 21. Task 2 : Clustering We used KNN (k-nearest neighbors) as clustering algorithm . First we find the similarity between user using Pearson correlation . We also checked cosine correlation but Pearson Correlation gives us better result . Where:- Rui, Rvi represent the checkingCount of ith item given by the user u and v respectively. Ru , Rv represent the average checkin of user u and v respectively. Iuv donates the set of items checked by both user u and v , = . 21 Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
  • 22. Task 2 : Clustering(Cont) After finding similarity we take top n nearest neighbor . Then used their checkinCount to find predicted checkinCount for every places of that user which user didnt check in . We used weighted average checkin to predict checkin count for a user . After this we took top most checkInCount. 22
  • 23. Task 3 : Find User Preference We used users every check-ins distance from center point and find a mean distance. If users most of the checkins distance are more than mean distance we can say user like to travel in long distance otherwise like to travel in close distance . Then we sort the recommendation on user preference . Example : Users mean checkin distance = 50 KM Users have 50 checkins . 30 of them are more than 50 km. Result : Users Love o travel in long distance 23
  • 24. Example Place1 Place2 Place3 Place4 Me 3 - 5 ? My Friend 4 6 - - You 3 - 5 6 Another guy 4 2 - 1 Your Friend 8 - - 3 What will be probable checking count of Place4? 24
  • 25. Example(Cont..) Place1 Place2 Place3 Place4 Me 3 - 5 ? My Friend 4 6 - - You 3 - 5 6 Another guy 4 2 - 1 Your Friend 8 - - 3 25
  • 26. Example(Cont..) Place1 Place2 Place3 Place4 Me 3 - 5 6 My Friend 4 6 - - You 3 - 5 6 Another guy 4 2 - 1 Your Friend 8 - - 3 26
  • 27. Evolution We used Sampling and RMSE technique for evaluating our recommendation. In sampling technique 10% of the entire dataset was selected randomly without replacement to make a sample dataset. RMSE technique was used to evaluate the algorithm. It calculated the error of a predicted check in count from an actual check in count of a venue by specific user in test dataset. RMSE Formula: RMSE = =1 , , 2 Here : P u,i=is the predicted checkIn Count for user u on venue i R u,i=is the actual checkIn Count for user u on venue i N=is the total number of venues where user checked in Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation 27
  • 29. Demo We have created a simple demo where user can enter their id and our system will recommend place for user . Figure : Input User Id Figure : Output Recommendation 29
  • 30. Future Work 30 Try Model Based Recommendation System Add More Domain Try Triangulation Technique to find users center point .
  • 31. 31