際際滷

際際滷Share a Scribd company logo
Kaggle  the global community 
of Data Science professionals 
Anastasiia Kornilova
Who am I?
- MS in Applied Mathematics, 
- 3 years as a Data Scientist
Kaggle - global Data Science community
What is Data Science?
Scientific 
Method 
Math 
Statistics 
Data 
Engineering 
Domain 
Expertise 
Advanced 
Visualization Computing 
Hacker 
Mindset 
What 
matters?
Kaggle - global Data Science community
What is Kaggle?
2010 - founded in Melbourne, Australia 
by Antony Goldbloom
What problem they solve? 
Data problems 
Data solvers
Kaggle - global Data Science community
Kaggle - global Data Science community
In fact, a McKinsey Global Institute report 
estimates that by 2018, the United States 
alone could face a shortage of 140,000 to 
190,000 people with deep analytical skills as 
well as 1.5 million managers and analysts 
with the know-how to use the analysis of 
big data to make effective decisions. 
! 
! 
!
Between 2010 and 2020, the data 
scientist career path is projected to 
increase by 18.7 percent, beat only by 
video game designers. The big data 
industry is expected to be a 53.4 billion 
industry by 2016.
Anyone with "data science" in his or 
her job title on a LinkedIn page is 
going to get "100 recruiter emails a 
day," said Josh Sullivan, who leads a 
500-person data-science group at the 
consulting firm Booz Allen Hamilton 
Holding
Are you good enough?
First Competition: 
Forecast Eurovision Song Contest Voting 
! 
! 
- 1000 dollars prize 
- 22 teams 
Outperformed prediction markets: 
predict 7 countries from Top10, prediction markets 
only 5.
Short story of success 
- 2011 - relocated to San Francisco 
- November, 2011 - raise 11M dollars fundings 
- July, 2013 - 100,000 data scientists involved 
- February, 2014 - more than 140,000 data 
scientists
Kaggle - global Data Science community
How you can use Kaggle?
Rewarding types 
- Knowledge 
- Money 
- Job interview
Competitions for knowledge 
(always open) 
! 
- Digit recognizer, CIFAR-10, First steps with Julia 
- Titanic: Machine Learning for Disaster 
- Bike Sharing Demand 
- Learning Social Circles in Networks
Competitions with prize: 
Open: 
- American Epilepsy Society Seizure Prediction 
Challenge: 25, 000 prize 
- Africa Soil Property Prediction Challenge: 8,000 prize 
- Tradeshift Text Classification: 5,000 prize
Completed competitions (170+) 
- Heritage Health Price: 500,000 
- GE Flight Quest: 250,000 
- GE Hospital Quest: 100,000 
- Higgs Boson ML Challenge: 13,000 + invitation to 
CERN 
- Galaxy Zoo: 16,000 
- KDD Author Paper Identification Challenge 
- Job Recommendation Challenge
Job competitions (completed): 
Facebook: 
- recommend missing links in social graph (who to follow) 
- optimal graph path 
- predict text tags 
Yelp: 
- estimate the number of useful votes a review will receive 
Wallmart: 
- predict store sales 
+ Job Board
How to win?
Dig into the data
Kaggle - global Data Science community
Stay on track
! 
Kaggle competition == Data science?
1. Understand 
2. Collect 
3. Data exploration 
4. Clean and 
transform 
6. Validate 
5. Model 
7. Communicating 
results 
Deploy
?

More Related Content

Kaggle - global Data Science community

  • 1. Kaggle the global community of Data Science professionals Anastasiia Kornilova
  • 3. - MS in Applied Mathematics, - 3 years as a Data Scientist
  • 5. What is Data Science?
  • 6. Scientific Method Math Statistics Data Engineering Domain Expertise Advanced Visualization Computing Hacker Mindset What matters?
  • 9. 2010 - founded in Melbourne, Australia by Antony Goldbloom
  • 10. What problem they solve? Data problems Data solvers
  • 13. In fact, a McKinsey Global Institute report estimates that by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. ! ! !
  • 14. Between 2010 and 2020, the data scientist career path is projected to increase by 18.7 percent, beat only by video game designers. The big data industry is expected to be a 53.4 billion industry by 2016.
  • 15. Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter emails a day," said Josh Sullivan, who leads a 500-person data-science group at the consulting firm Booz Allen Hamilton Holding
  • 16. Are you good enough?
  • 17. First Competition: Forecast Eurovision Song Contest Voting ! ! - 1000 dollars prize - 22 teams Outperformed prediction markets: predict 7 countries from Top10, prediction markets only 5.
  • 18. Short story of success - 2011 - relocated to San Francisco - November, 2011 - raise 11M dollars fundings - July, 2013 - 100,000 data scientists involved - February, 2014 - more than 140,000 data scientists
  • 20. How you can use Kaggle?
  • 21. Rewarding types - Knowledge - Money - Job interview
  • 22. Competitions for knowledge (always open) ! - Digit recognizer, CIFAR-10, First steps with Julia - Titanic: Machine Learning for Disaster - Bike Sharing Demand - Learning Social Circles in Networks
  • 23. Competitions with prize: Open: - American Epilepsy Society Seizure Prediction Challenge: 25, 000 prize - Africa Soil Property Prediction Challenge: 8,000 prize - Tradeshift Text Classification: 5,000 prize
  • 24. Completed competitions (170+) - Heritage Health Price: 500,000 - GE Flight Quest: 250,000 - GE Hospital Quest: 100,000 - Higgs Boson ML Challenge: 13,000 + invitation to CERN - Galaxy Zoo: 16,000 - KDD Author Paper Identification Challenge - Job Recommendation Challenge
  • 25. Job competitions (completed): Facebook: - recommend missing links in social graph (who to follow) - optimal graph path - predict text tags Yelp: - estimate the number of useful votes a review will receive Wallmart: - predict store sales + Job Board
  • 27. Dig into the data
  • 30. ! Kaggle competition == Data science?
  • 31. 1. Understand 2. Collect 3. Data exploration 4. Clean and transform 6. Validate 5. Model 7. Communicating results Deploy
  • 32. ?