�ݺ�ߣ

Machine Learning and
Data at Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola

My Background
● Software Engineer/Data Scientist
● Machine learning team
● At Meetup since May 2012
● BS Computer Science
○ Information Retrieval
○ Data Mining
○ Math
■ Linear Algebra
■ Graph Theory

You
● Data Scientists?
● Engineers?
● Statisticians?
● Students?
● Non-technical?

What this talk is
● Super secret peek into Meetup!
● Meetup recommendations examples
● How we do recommendations
(model/features)
● Lessons learned/what’s next

What this talk isn’t
● What is a data scientist?
● What is big data?
● How does matrix factorization or gradient
boosted decision trees or map reduce or this
framework I hope you’ll use work?

Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and 114 million rsvps by >14 million
members
● 2.7 million rsvps in the last 30 days
○ ~1/second

Machine learning and data at Meetup

Data at Meetup
● User data
● Site monitoring/performance
● AB testing
● Recommendations*

“Everything is a recommendation”
● Not my phrase
● Not actually true yet
● Working on it

Topic Recommendations
● New registrant
● Don’t know anything about you yet!
● Most popular is boring/repetitive
Algorithm:
○ Group local meetups by topic
○ Select topic with most groups
○ Remove those groups
○ Repeat

Group/Event Recommendations
● Replaced a topic only system
● Inputs:
○ Member, location, topics, facebook friends?
demographics?
● Outputs:
○ Ranking

Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this

Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Groups/person
○ Membership: 0.001%
○ Compare to Netflix: 1%

Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup/Didn’t join Meetup
● “Features”

Logistic Regression
● Score
○ “Probability”
○ Ranking
● Fast + Easy
● Weights!

Group recommendation weights
● TopicMatch 1.21
● TopicMatchExtended 0.17
● FacebookFriends 0.15
● SecondDegreeFacebook 0.79
● AgeUnmatch -2.20
● GenderUnmatch -2.6
● StateMatchFeature 0.44
● CityMatch 0.02
● DistanceBucket <2 1.39
● DistanceBucket 2-5 0.83
● DistanceBucket 5-10 0.60
● DistanceBucket >10 n/a

Making up features
● “Zipscore”
● All topics not created equal
● Facebook likes

Real data is gross
● Preprocessing is critical!
○ missing data
○ outliers
○ log scale
○ bucketing
○ selection/sampling (not introducing bias)

Cleaning data
● Schenectady
● Beverly Hills
● Astronaut
● Fake RSVP boosts (+100 guests!)
● Rsvp hogs

TO THE FUTURE!
● Hadoop
● Clicks
● Impressions
● People to people recommendations?
● Recommending people to groups?

Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/
Special thanks:
● Chris Halpert
● Victor J Wang

�ݺ�ߣ

Machine learning and data at Meetup

More Related Content

Machine learning and data at Meetup