際際滷

際際滷Share a Scribd company logo
Introduction 
Sean Byrnes 
http://seanbyrnes.com 
@sbyrnes 
to 
Data Science
Who Am I? 
f 
ATTENDED 
FOUNDED 
CURRENTLY 
from Yahoo!
Introduction to Data Science 
 What is Data Science? 
 Example 1: Basic Math 
 Example 2: Regression Modeling 
 Example 3: Recommender Systems 
 Getting started in data science
What is Data Science? 
Software Engineering 
+ 
Statistical Analysis
What is Data Science? 
1. Question 
2. Data Gathering 
3. Exploration 
4. Modeling 
5. Answer 
6. Production
Example 1: Basic Math 
What is my customer churn rate? 
def. Churn rate: The percentage of subscribers to a 
service that discontinue their subscription to that service 
in a given time period. (aka attrition rate)
Example 1: Basic Math 
# customers at start 
Churn(month) = 
# customers lost
Example 1: Basic Math 
Month Churn 
Dec '13 3.75% 
Nov '13 1.87% 
Oct '13 3.82% 
Sep '13 2.76% 
Aug '13 2.43% 
Jul '13 2.04% 
Jun '13 1.60%
Example 1: Basic Math 
For all customers acquired in a given month 
Retention(Cmonth) = 
Active(Cmonth) 
Total(Cmonth)
Example 1: Basic Math 
0 1 2 3 4 5 6 
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% 
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% 
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% 
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% 
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% 
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% 
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
Example 1: Basic Math 
0 1 2 3 4 5 6 
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% 
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% 
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% 
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% 
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% 
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% 
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
Example 2: Regression Modeling 
How many users will we have next month?
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
Example 2: Regression Modeling 
For data set X(n), find f(n) such that 
f(ni) ~ X(ni)
Example 2: Regression Modeling 
Assume X(ni) = [x1, x2,  xk] 
f(n) = c1x1 + c2x2 + c3x3 +  + cnxn
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 
Linear Model
Example 2: Regression Modeling 
Assume X(ni) = [x1, x2,  xk] 
f(n) = c1x1 + c2x2 + c3x3 +  + cnxn 
Or, maybe 
f(n) = c1x1 + c2x1 
2 + c3x2 + c4x2 
2 + + cmxn 
2
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 
2nd Degree Polynomial Model
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 
4th Degree Polynomial Model
Example 2: Regression Modeling 
https://github.com/sbyrnes/Lyric
Example 3: Recommender Systems 
What other products might this 
customer buy?
Example 3: Recommender Systems 
Product 1 Product 2 Product 3  Product N 
Customer 1 3.5 4.0 3.0 
Customer 2 2.0 3.5 
Customer 3 3.0 2.5 
 
Customer 
N 
4.5 4.5
Example 3: Recommender Systems 
Given customer preference matrix M, find 
P x Q ~ M
Example 3: Recommender Systems 
Product 1 Product 2 Product 3  Product N 
Customer 1 3.5 4.0 2.5 3.0 
Customer 2 2.0 1.5 3.5 3.0 
Customer 3 1.5 3.0 2.5 4.0 
 
Customer 
N 
4.5 3.5 4.0 4.5
Example 3: Recommender Systems 
Given customer preferences c[p1,p2,pn] 
and overall rating average roverall 
cbias = mean(c[p1], c[p2], c[pn])  roverall
Example 3: Recommender Systems 
https://github.com/sbyrnes/likely.js
Getting Started in Data Science 
 Programming 
 Statistics 
 Machine learning 
 Toolkit 
 R 
 Hadoop 
 D3
seanbyrnes.com 
@sbyrnes 
github.com/sbyrnes
Sean Byrnes 
seanbyrnes.com 
@sbyrnes 
github.com/sbyrnes

More Related Content

Introduction to Data Science

  • 1. Introduction Sean Byrnes http://seanbyrnes.com @sbyrnes to Data Science
  • 2. Who Am I? f ATTENDED FOUNDED CURRENTLY from Yahoo!
  • 3. Introduction to Data Science What is Data Science? Example 1: Basic Math Example 2: Regression Modeling Example 3: Recommender Systems Getting started in data science
  • 4. What is Data Science? Software Engineering + Statistical Analysis
  • 5. What is Data Science? 1. Question 2. Data Gathering 3. Exploration 4. Modeling 5. Answer 6. Production
  • 6. Example 1: Basic Math What is my customer churn rate? def. Churn rate: The percentage of subscribers to a service that discontinue their subscription to that service in a given time period. (aka attrition rate)
  • 7. Example 1: Basic Math # customers at start Churn(month) = # customers lost
  • 8. Example 1: Basic Math Month Churn Dec '13 3.75% Nov '13 1.87% Oct '13 3.82% Sep '13 2.76% Aug '13 2.43% Jul '13 2.04% Jun '13 1.60%
  • 9. Example 1: Basic Math For all customers acquired in a given month Retention(Cmonth) = Active(Cmonth) Total(Cmonth)
  • 10. Example 1: Basic Math 0 1 2 3 4 5 6 Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
  • 11. Example 1: Basic Math 0 1 2 3 4 5 6 Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
  • 12. Example 2: Regression Modeling How many users will we have next month?
  • 13. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
  • 14. Example 2: Regression Modeling For data set X(n), find f(n) such that f(ni) ~ X(ni)
  • 15. Example 2: Regression Modeling Assume X(ni) = [x1, x2, xk] f(n) = c1x1 + c2x2 + c3x3 + + cnxn
  • 16. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 Linear Model
  • 17. Example 2: Regression Modeling Assume X(ni) = [x1, x2, xk] f(n) = c1x1 + c2x2 + c3x3 + + cnxn Or, maybe f(n) = c1x1 + c2x1 2 + c3x2 + c4x2 2 + + cmxn 2
  • 18. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 2nd Degree Polynomial Model
  • 19. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 4th Degree Polynomial Model
  • 20. Example 2: Regression Modeling https://github.com/sbyrnes/Lyric
  • 21. Example 3: Recommender Systems What other products might this customer buy?
  • 22. Example 3: Recommender Systems Product 1 Product 2 Product 3 Product N Customer 1 3.5 4.0 3.0 Customer 2 2.0 3.5 Customer 3 3.0 2.5 Customer N 4.5 4.5
  • 23. Example 3: Recommender Systems Given customer preference matrix M, find P x Q ~ M
  • 24. Example 3: Recommender Systems Product 1 Product 2 Product 3 Product N Customer 1 3.5 4.0 2.5 3.0 Customer 2 2.0 1.5 3.5 3.0 Customer 3 1.5 3.0 2.5 4.0 Customer N 4.5 3.5 4.0 4.5
  • 25. Example 3: Recommender Systems Given customer preferences c[p1,p2,pn] and overall rating average roverall cbias = mean(c[p1], c[p2], c[pn]) roverall
  • 26. Example 3: Recommender Systems https://github.com/sbyrnes/likely.js
  • 27. Getting Started in Data Science Programming Statistics Machine learning Toolkit R Hadoop D3
  • 29. Sean Byrnes seanbyrnes.com @sbyrnes github.com/sbyrnes