This document discusses a model for predicting attendance at events based on multiple data sources and factors. It considers using absolute numbers from check-ins, weights from context like weather and paid events, user past behavior patterns, and classifications based on age, gender and other user attributes. The goal is to separate true signals from noise and provide a probability distribution of potential attendance outcomes.
Convert to study guideBETA
Transform any presentation into a summarized study guide, highlighting the most important points and key insights.
2. Priorities
ï‚— Separate signal from noise
ï‚— Can we at least predict better than others
ï‚— Ideally a probability distribution model giving us
ideas for best-case, expected value or worst-case
behavior
4. Absolute numbers
ï‚— Foursquare check-ins are fairly reliable, as are MTA
and TSA swipes
ï‚— This just gets added to the prediction, no weighing
applied currently but may be modify if there is a
trend of fake data being generated
5. Context based Prediction
 Context based – Use Decision Tree Learning to
generate weights to apply to event rsvp counts for
eventbrite, meetup, facebook.
ï‚— E.g A meetup event rsvp has a higher weight if it is a
paid event, has free giveaways and if the weather is
nice
ï‚— Weights are in range 0-1 and we multiply each event
rsvp count by their weight and divide by 3 to get the
weighted average rsvp count.
ï‚— Events of similar nature in a nearby radius will
downgrade the potential attendance
6. User Learning based prediction
ï‚— A persons likelihood of attending an event can be
modelled in a Bayesian manner
ï‚— Past event attendance/rsvp ratio , history of
attending a series of events of a particular nature
ï‚— Item based classification is another factor e.g if
person a,b,c,d attend events X and Y and we know
that b,c, and d are attending event Z, there is a
higher chance for a to attend event Z
7. Age based classification
ï‚—
ï‚— Sharing peaks at teenage, early adulthood and then falls
down
ï‚— Influence of social data needs inversely weighing to infer
total count of people at an event