際際滷

際際滷Share a Scribd company logo
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Please tweet!
@LoveStats #ESOMAR
BIGData Myths
Presented by Annie Pettit
Chief Research Officer at Peanut Labs,
a Research Now Group Company .
Please tweet! #ESOMAR @LoveStats
Big Data Myths
What is Big Data?
Volume VelocityVariety
 Research panel data
 Shopper/ Loyalty/
Transactional data
 Web tracking data
 Text, video, audio, date, time, $, 蔵,
coupons, loyalty card, SKU
 url, click, save, download, lat/long
 Eye motion, brain wave, electrical
pulse
 Every
picosecond
http://giphy.com/search/gotta-go-fast
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Strike Down
The
Myths!
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data is New
2015 Supercomputer
≒Take on the biggest jobs, tasks other
computer systems simply cant handle
Clock speed 173 petaflops
http://www.forbes.com/sites/sungardas/2015/04/14/the-amazing-super-powers-of-a-supercomputer/
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Is 1985 New?
Supercomputer 30 years ago
≒Large memory and performance allows
users to solve problems that cannot be
solved with any other computer
Clock cycle 4.1 nanoseconds
http://archive.computerhistory.org/resources/text/Cray/Cray.Cray2.1985.102646185.pdf
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big Data is only new to some
 2002: drugstore transactional database
 2005: research panel database
 2010: social media database
 2015: research panel database
Just Me
 1979: Texas Airlines loyalty program
 2002: Target advertising, Andrew Poole
 2004: Walmart stocks stores for hurricanes
MRX
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data Is Better
Emotions Attitudes
Beliefs
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Volume trumps knowledge
4.7
5.1
15.0
15.8
15.6
16.5
Total
Only surveys
Only completes
Only USA
Only recent
No test links
PLs
average
survey
minutes:
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big data is clean
One SQL Table
N=75 million
Variables = 1012
Missing values = 53
million
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big data has clean parts
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big data is the population
One second later, it was missing 3 records
One minute later, it was missing 180 records
One day later, it was missing 260,000
records
Today, it is missing 15 million records.
On March 25 at 12:10:16
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big Data is Never Complete
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Data speed is everything
Completion Rate (Per Second)
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Speed is awesome!
If you dont care about
Coding Errors
Outliers
AccuracyExceptions Interactions
Validity
Generalizability
Reliability
Comprehension
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data Renders Science Obsolete
 Incomplete data
 Miscoded data
 Misplaced data
Remember non-random
 +/- 5, 19 times out of 20
 p-values
 Type 1 and Type 2 errors
Remember random
Total Research
Error
Please tweet! #ESOMAR @LoveStats
Big Data MythsBig Data Myths
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Name That Software!
Compute VoteForHiggins=999.
If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsLikely=1) VoteForHiggins=1.
If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsUnlikely=1) VoteForHiggins=0.
If (Q2votepastYes=1 and Q4votetodayYes=1) and (Q6voteHigginsUnsure=1)
VoteForHiggins=69.
MISSING VALUES VoteForHiggins (69 '999').
Execute.
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Name That Software!
PROC corr data = ResearchData.Client243 OUTP=ClientOutput
nomiss;
VAR PurchaseIntent Recommend Different New Value;
TITLE2 Correlations of Key Indicators';
RUN;
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Name That Software!
Select RecruitDate, avg(CompletesPerPerson)
From
(select RecruitDate, count(*) as CompletesPerPerson
from CompleteDataBase
group by UserID) RecruitData
Group by RecruitDate
Order by RecruitDate
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big Data is for the IT Department
1. RapidMiner
2. R
3. Excel
4. SQL
5. Python
6. Weka
7. KNIME
8. Hadoop
9. SAS base
10. SQL Server
http://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-software-used.html
Please tweet! #ESOMAR @LoveStats
Big Data Mythshttp://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html
Math
and
Statistics
Subject
Matter
Expertise
Computer
Science
BIG DATA is
YOU and ME!
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Myth: Big data requires a big budget
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Software Costs
Please tweet! #ESOMAR @LoveStats
Big Data Myths
People Costs
Marketing Manager: $60 000
IT Product Manager: $80 000
Research Scientist: $61 000
Software Engineer: $60 000
Statistician: $57 000
http://www.payscale.com/research/CA/Job=Data_Scientist,_IT/Salary
Data Scientist:
$70 000
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Big Data
 Is not new
 Is not clean nor complete
 Does not trump knowledge
 Does not render science obsolete
 Is not just for IT
 Doesnt win because of speed
 Does not require a huge budget
 Is not by definition better
Please tweet! #ESOMAR @LoveStats
Big Data Myths
What is Big Data Really?
Fast Actionable Relevant
 Your products
 Your clients
 Your key metrics
 Definable
 Measurable
 Changeable
 Awesomeable
 Already fielded
 Already awesome
sample sizes
 Already in a dataset
Please tweet! #ESOMAR @LoveStats
Big Data Myths
Thank you!
Annie Pettit
Chief Research Officer
annie@peanutlabs.com
ca.linkedin.com/in/AnniePettit/
facebook.com/AnniePettit
twitter.com/LoveStats
Jonathan Cheriff
Director of Sales & Marketing
jonathan.cheriff@peanutlabs.com
Find PeanutLabs on
LinkedIn Facebook Twitter YouTube

More Related Content

Blasting 10 Big Data Myths with 10 Panel Data Examples

  • 1. Please tweet! #ESOMAR @LoveStats Big Data Myths Please tweet! @LoveStats #ESOMAR BIGData Myths Presented by Annie Pettit Chief Research Officer at Peanut Labs, a Research Now Group Company .
  • 2. Please tweet! #ESOMAR @LoveStats Big Data Myths What is Big Data? Volume VelocityVariety Research panel data Shopper/ Loyalty/ Transactional data Web tracking data Text, video, audio, date, time, $, 蔵, coupons, loyalty card, SKU url, click, save, download, lat/long Eye motion, brain wave, electrical pulse Every picosecond http://giphy.com/search/gotta-go-fast
  • 3. Please tweet! #ESOMAR @LoveStats Big Data Myths Strike Down The Myths!
  • 4. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data is New 2015 Supercomputer ≒Take on the biggest jobs, tasks other computer systems simply cant handle Clock speed 173 petaflops http://www.forbes.com/sites/sungardas/2015/04/14/the-amazing-super-powers-of-a-supercomputer/
  • 5. Please tweet! #ESOMAR @LoveStats Big Data Myths Is 1985 New? Supercomputer 30 years ago ≒Large memory and performance allows users to solve problems that cannot be solved with any other computer Clock cycle 4.1 nanoseconds http://archive.computerhistory.org/resources/text/Cray/Cray.Cray2.1985.102646185.pdf
  • 6. Please tweet! #ESOMAR @LoveStats Big Data Myths Big Data is only new to some 2002: drugstore transactional database 2005: research panel database 2010: social media database 2015: research panel database Just Me 1979: Texas Airlines loyalty program 2002: Target advertising, Andrew Poole 2004: Walmart stocks stores for hurricanes MRX
  • 7. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data Is Better Emotions Attitudes Beliefs
  • 8. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Volume trumps knowledge 4.7 5.1 15.0 15.8 15.6 16.5 Total Only surveys Only completes Only USA Only recent No test links PLs average survey minutes:
  • 9. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big data is clean One SQL Table N=75 million Variables = 1012 Missing values = 53 million
  • 10. Please tweet! #ESOMAR @LoveStats Big Data Myths Big data has clean parts
  • 11. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big data is the population One second later, it was missing 3 records One minute later, it was missing 180 records One day later, it was missing 260,000 records Today, it is missing 15 million records. On March 25 at 12:10:16
  • 12. Please tweet! #ESOMAR @LoveStats Big Data Myths Big Data is Never Complete
  • 13. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Data speed is everything Completion Rate (Per Second)
  • 14. Please tweet! #ESOMAR @LoveStats Big Data Myths Speed is awesome! If you dont care about Coding Errors Outliers AccuracyExceptions Interactions Validity Generalizability Reliability Comprehension
  • 15. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data Renders Science Obsolete Incomplete data Miscoded data Misplaced data Remember non-random +/- 5, 19 times out of 20 p-values Type 1 and Type 2 errors Remember random Total Research Error
  • 16. Please tweet! #ESOMAR @LoveStats Big Data MythsBig Data Myths
  • 17. Please tweet! #ESOMAR @LoveStats Big Data Myths Name That Software! Compute VoteForHiggins=999. If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsLikely=1) VoteForHiggins=1. If Q2votepastYes=1 and Q4votetodayYes=1 and (Q6voteHigginsUnlikely=1) VoteForHiggins=0. If (Q2votepastYes=1 and Q4votetodayYes=1) and (Q6voteHigginsUnsure=1) VoteForHiggins=69. MISSING VALUES VoteForHiggins (69 '999'). Execute.
  • 18. Please tweet! #ESOMAR @LoveStats Big Data Myths Name That Software! PROC corr data = ResearchData.Client243 OUTP=ClientOutput nomiss; VAR PurchaseIntent Recommend Different New Value; TITLE2 Correlations of Key Indicators'; RUN;
  • 19. Please tweet! #ESOMAR @LoveStats Big Data Myths Name That Software! Select RecruitDate, avg(CompletesPerPerson) From (select RecruitDate, count(*) as CompletesPerPerson from CompleteDataBase group by UserID) RecruitData Group by RecruitDate Order by RecruitDate
  • 20. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big Data is for the IT Department 1. RapidMiner 2. R 3. Excel 4. SQL 5. Python 6. Weka 7. KNIME 8. Hadoop 9. SAS base 10. SQL Server http://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-software-used.html
  • 21. Please tweet! #ESOMAR @LoveStats Big Data Mythshttp://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html Math and Statistics Subject Matter Expertise Computer Science BIG DATA is YOU and ME!
  • 22. Please tweet! #ESOMAR @LoveStats Big Data Myths Myth: Big data requires a big budget
  • 23. Please tweet! #ESOMAR @LoveStats Big Data Myths Software Costs
  • 24. Please tweet! #ESOMAR @LoveStats Big Data Myths People Costs Marketing Manager: $60 000 IT Product Manager: $80 000 Research Scientist: $61 000 Software Engineer: $60 000 Statistician: $57 000 http://www.payscale.com/research/CA/Job=Data_Scientist,_IT/Salary Data Scientist: $70 000
  • 25. Please tweet! #ESOMAR @LoveStats Big Data Myths Big Data Is not new Is not clean nor complete Does not trump knowledge Does not render science obsolete Is not just for IT Doesnt win because of speed Does not require a huge budget Is not by definition better
  • 26. Please tweet! #ESOMAR @LoveStats Big Data Myths What is Big Data Really? Fast Actionable Relevant Your products Your clients Your key metrics Definable Measurable Changeable Awesomeable Already fielded Already awesome sample sizes Already in a dataset
  • 27. Please tweet! #ESOMAR @LoveStats Big Data Myths Thank you! Annie Pettit Chief Research Officer annie@peanutlabs.com ca.linkedin.com/in/AnniePettit/ facebook.com/AnniePettit twitter.com/LoveStats Jonathan Cheriff Director of Sales & Marketing jonathan.cheriff@peanutlabs.com Find PeanutLabs on LinkedIn Facebook Twitter YouTube