際際滷

際際滷Share a Scribd company logo
Machine Learning and
Data at Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola
My Background
 Software Engineer/Data Scientist
 Machine learning team
 At Meetup since May 2012
 BS Computer Science
 Information Retrieval
 Data Mining
 Math
 Linear Algebra
 Graph Theory
You
 Data Scientists?
 Engineers?
 Statisticians?
 Students?
 Non-technical?
What this talk is
 Super secret peek into Meetup!
 Meetup recommendations examples
 How we do recommendations
(model/features)
 Lessons learned/whats next
What this talk isnt
 What is a data scientist?
 What is big data?
 How does matrix factorization or gradient
boosted decision trees or map reduce or this
framework I hope youll use work?
Why Meetup data is cool
 Real people meeting up
 Every meetup could change someone's life
 No ads, just do the best thing
 Oh and 114 million rsvps by >14 million
members
 2.7 million rsvps in the last 30 days
 ~1/second
Machine learning and data at Meetup
Data at Meetup
 User data
 Site monitoring/performance
 AB testing
 Recommendations*
Everything is a recommendation
 Not my phrase
 Not actually true yet
 Working on it
Recommendation
Machine learning and data at Meetup
Machine learning and data at Meetup
Topic Recommendations
 New registrant
 Dont know anything about you yet!
 Most popular is boring/repetitive
Algorithm:
 Group local meetups by topic
 Select topic with most groups
 Remove those groups
 Repeat
Machine learning and data at Meetup
Machine learning and data at Meetup
Group/Event Recommendations
 Replaced a topic only system
 Inputs:
 Member, location, topics, facebook friends?
demographics?
 Outputs:
 Ranking
Collaborative Filtering
 Classic recommendations approach
 Users who like this also like this
Why Recs at Meetup are hard
 Incomplete Data (topics)
 Cold start
 Asking user for data is hard
 Going to meetups is scary
 Sparsity
 Location
 Groups/person
 Membership: 0.001%
 Compare to Netflix: 1%
Supervised Learning/Classification
 Inferring a function from labeled training
data
 Joined Meetup/Didnt join Meetup
 Features
Topic Match
State Match
Logistic Regression
 Score
 Probability
 Ranking
 Fast + Easy
 Weights!
Group recommendation weights
 TopicMatch 1.21
 TopicMatchExtended 0.17
 FacebookFriends 0.15
 SecondDegreeFacebook 0.79
 AgeUnmatch -2.20
 GenderUnmatch -2.6
 StateMatchFeature 0.44
 CityMatch 0.02
 DistanceBucket <2 1.39
 DistanceBucket 2-5 0.83
 DistanceBucket 5-10 0.60
 DistanceBucket >10 n/a
Making up features
 Zipscore
 All topics not created equal
 Facebook likes
Real data is gross
 Preprocessing is critical!
 missing data
 outliers
 log scale
 bucketing
 selection/sampling (not introducing bias)
Cleaning data
 Schenectady
 Beverly Hills
 Astronaut
 Fake RSVP boosts (+100 guests!)
 Rsvp hogs
Machine learning and data at Meetup
Machine learning and data at Meetup
TO THE FUTURE!
 Hadoop
 Clicks
 Impressions
 People to people recommendations?
 Recommending people to groups?
Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/
Special thanks:
 Chris Halpert
 Victor J Wang

More Related Content

Machine learning and data at Meetup

  • 1. Machine Learning and Data at Meetup Evan Estola Meetup.com evan@meetup.com @estola
  • 2. My Background Software Engineer/Data Scientist Machine learning team At Meetup since May 2012 BS Computer Science Information Retrieval Data Mining Math Linear Algebra Graph Theory
  • 3. You Data Scientists? Engineers? Statisticians? Students? Non-technical?
  • 4. What this talk is Super secret peek into Meetup! Meetup recommendations examples How we do recommendations (model/features) Lessons learned/whats next
  • 5. What this talk isnt What is a data scientist? What is big data? How does matrix factorization or gradient boosted decision trees or map reduce or this framework I hope youll use work?
  • 6. Why Meetup data is cool Real people meeting up Every meetup could change someone's life No ads, just do the best thing Oh and 114 million rsvps by >14 million members 2.7 million rsvps in the last 30 days ~1/second
  • 8. Data at Meetup User data Site monitoring/performance AB testing Recommendations*
  • 9. Everything is a recommendation Not my phrase Not actually true yet Working on it
  • 13. Topic Recommendations New registrant Dont know anything about you yet! Most popular is boring/repetitive Algorithm: Group local meetups by topic Select topic with most groups Remove those groups Repeat
  • 16. Group/Event Recommendations Replaced a topic only system Inputs: Member, location, topics, facebook friends? demographics? Outputs: Ranking
  • 17. Collaborative Filtering Classic recommendations approach Users who like this also like this
  • 18. Why Recs at Meetup are hard Incomplete Data (topics) Cold start Asking user for data is hard Going to meetups is scary Sparsity Location Groups/person Membership: 0.001% Compare to Netflix: 1%
  • 19. Supervised Learning/Classification Inferring a function from labeled training data Joined Meetup/Didnt join Meetup Features
  • 22. Logistic Regression Score Probability Ranking Fast + Easy Weights!
  • 23. Group recommendation weights TopicMatch 1.21 TopicMatchExtended 0.17 FacebookFriends 0.15 SecondDegreeFacebook 0.79 AgeUnmatch -2.20 GenderUnmatch -2.6 StateMatchFeature 0.44 CityMatch 0.02 DistanceBucket <2 1.39 DistanceBucket 2-5 0.83 DistanceBucket 5-10 0.60 DistanceBucket >10 n/a
  • 24. Making up features Zipscore All topics not created equal Facebook likes
  • 25. Real data is gross Preprocessing is critical! missing data outliers log scale bucketing selection/sampling (not introducing bias)
  • 26. Cleaning data Schenectady Beverly Hills Astronaut Fake RSVP boosts (+100 guests!) Rsvp hogs
  • 29. TO THE FUTURE! Hadoop Clicks Impressions People to people recommendations? Recommending people to groups?
  • 30. Thanks! Smart people come work with me. http://www.meetup.com/jobs/ Special thanks: Chris Halpert Victor J Wang