�ݺ�ߣ

1. Early Lessons Learned in Applying Big Data To TV AdvertisingARF September 12, 2011Jack Smith, Chief Product Officer, Simulmedia

2. About UsWho We AreWe are a New York based start-up. We are venture backed by Avalon Ventures, Union Square Ventures and Time-Warner.Where We Have BeenOur 35 person team has veterans of:What We BelieveTelevision is still the most powerful advertising medium in the world. While addressability will come, we��re not waiting for it. We��ve taken a few strategies we learned from the Internet and are applying it to linear TV advertising, today.Through partnerships with major data providers, we have assembled the world��s largest set of actionable television data.How We Do ItHow We Make MoneyWe sell television advertising. With inventory in over 106 million US households, we can cost-effectively extend reach into high-value target audiences across virtually any advertiser category. We use big data and science to do this.

3. Why Did We Leave The Web?Television remains the dominant consumer medium(a) Nielsen US TV Viewing AudicenceTraditional Live-Only TV based on average monthly viewing during 1Q2011. Internet and Online Video based on average monthly consumption during July 2011. Video on Demand based on consumption during May 2011.

4. TV Spend Is IncreasingSource: MAGNAGLOBAL

5. Audience Is FragmentingSource: Nielsen via TVbythenumbers.com

6. Campaign Reach Is DecliningImpossible for measurement and planning tools to keep pace Source: Simulmedia analysis of data from SQAD, Nielsen and TVB

7. Big Data

8. Big Data Is Driving Growth��We are on the cusp of a tremendous wave of innovation, productivity and growth, as well as new modes of competition and value-capture �C all driven by Big Data.��- McKinsey Global Institute, May 2011��For CMOs,Big Data is a very big deal.��- Alfredo Gangotena, CMO, Mastercard, July 2011

9. Size Is Relative1 byte x 1000 = 1 kilobyte��x 1000 = 1 megabyte��x 1000 = 1 gigabyte��x 1000 = 1 terabyte��x 1000 = 1 petabyte��x 1000 = 1 exabyte

10. Size Is RelativeTelegram = 100 bytesData???1997-2011, James?S.?Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

11. Size Is RelativePage of an Encyclopedia = 100 kilobytesData???1997-2011, James?S.?Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

12. Size Is RelativePickup truck bed full of paper = 1 gigabyte Data???1997-2011, James?S.?Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

13. Size Is RelativeEntire print collection of the Library of Congress = 10 terabytesData???1997-2011, James?S.?Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

14. Size Is RelativeAll hard drives produced in 1995 = 20 petabytes Data???1997-2011, James?S.?Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

15. Size Is RelativeAll printed material = 200 petabytes Data???1997-2011, James?S.?Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

16. But Big Data Is More Than SizeWhat happened?Why did it happen?BIG DATAWhat��s going to happen next?Time:PastFutureFocus:ReportingPredictionSupports:Human decisionsMachine decisionsStructuredAggregatedUnstructuredUnaggregatedData:DashboardsExcelDiscoveryVisualizationStatistics & PhysicsHuman Skills:

17. Accelerating The Push To Big DataHadoop, cloud computing, Facebook, Yahoo, quants, Bittorrent, machine learning, Stanford, large hadron collider, Wal-Mart, text processing, Amazon S3 & EC2, open source intelligence, NoSQL, social media, Google, commodity hardware, Hive, fraud detection, trading desks, MapReduce, natural language processing

18. What Can It Mean For TV Advertising?Big data drove the rise of web & search advertisingAccumulation of high volume of direct measurement of media consumption

19. Better predictions about consumer interests

20. Real time return path

21. Automation

22. Interim step for addressability

23. More diligence around consumer privacy

24. Media buyers and sellers rethinking their approach to audience packaging, campaign planning, technology, data assembly and peoplePost Modern ArchitectureHave we reached the limits of classic data storage architecture?Data WarehousesYahoo!: 700 tb1?

25. Australian Bureau of Statistics: 250 tb1

26. AT&T: 250 tb1

27. Nielsen: 45 tb1

28. Adidas: 13 tb1

29. Wal-Mart: 1 pb2Data LakesFacebook: 30 pb3 (7x compression)

30. Yahoo: 22 pb4

31. Google: ???1 Oracle F1Q10 Earnings Call September 16, 2009 Transcript2Stair, Principles of Information Systems, 2009, p 1813 Dhruba Borthakur, Facebook, December 2010, http://www.facebook.com/note.php?note_id=4682111939194 Simulmedia estimate

32. Our Idea of Big DataBringing the data set together in a single platformOur (comparatively modest) data set:200 tb (approx. 7x compression)

33. 113,858,592 daily events

34. Approximately 402,301 weekly ads

35. Double capacity every 6 months��And we don��t load every data point across all data sets, yet

36. Rethinking Media Data ArchitectureApplying big data to television required us to rethink what our technical architecture should beCommodity HardwareNo clouds allowed (ISO compliance)

37. Expect hardware failure

38. Learn from those who have done it

39. Participate in the Open Source communityOpen Source SoftwareWrite Your Own SoftwareELT(Extract, Load, Transform)

40. Meddle

41. Machine learningScienceAdvanced statistical techniques

42. ExperimentationSome Wrinkles In The MatrixNo standards for set top boxesChannel mappingTime synchronizationOn/off rules��.Consult the sagesBuild the team

43. The People We NeededA different approach required different skill setsNew core skills for everyone in the company

44. Pattern recognition

45. Visualization

46. Technology

47. Experimentation

48. Where do you find hard to find tech skills?

49. You don��t find them. You make them.

50. A dedicated Science team

51. Non traditional researchers (Brain imaging, bioinformatics, economic modeling, genetics)

52. People who watch a lot of television10 Lessons We��ve Learned

53. Some Things To Know, FirstLive viewing unless otherwise noted

54. Time shifting lessons is a whole other presentation

55. Time shifting + live viewing lessons is a whole other other presentation

56. Video on demand is a whole other other other presentation

57. We name names and provide numbers where clients and data partners permit

58. Client confidentiality is important to us

59. None of this work would��ve been possible without the help of our clients and partnersThis box will contain important information about the graphs on each page.Read me��

60. 60% of TV Viewers Watch 90% of TV

61. Where The Other 40% AreNetworks with relatively fewer lighter viewer impressions Networks with relatively more lighter viewer impressions Vertical: Ratio of Heavy Viewers to light viewer impressions. Horizontal: Low rated to Highly rated networks Call outs: Ratio is the number of Heavier Viewer impressions you would deliver to reach a Lighter Viewer on a given networkHigher rated networksLowerrated networksSources: Nielsen & Simulmedia��s a7

62. Where The Other 40% AreTo capture light viewers, media planning and measurement tools must quickly apply new methods to emerging data sets

63. Quality Control Is A Full Time Job

64. When Data Goes MissingAutomation of error checking/quality control is essentialReuse the data to solve other problemsOccasionally observe missing dataThree choices:Pick up the phone

65. Estimate missing fields

66. Work around the missing dataTime series of SYFY network. 10645 observations from 2010.02.28 at 7:00pm Eastern to 2010.10.14 at 12:30pm EasternSource: Simulmedia��s a7

67. More Data Really Is Better

68. Disambiguation: The Madonna ProblemORPop Icon?Religious icon?

69. The Revolution of Simple MethodsMore data beats better algorithms.The best performing algorithm underperforms the worst algorithm when given an order of magnitude more data. Simple algorithms at very large scale can help better predict audience movement.Peter Norvig | Internet Scale Data Analysis | June 21, 2010Original graph sourced from: Banko & Brill, 2001. Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing

70. Packaging ReachVery large data sets better predict TV audience movementsPeter Norvig | Internet Scale Data Analysis | June 21, 2010

71. The Cost Of More DataMore data drives better results but there are costs

72. The Data Isn��t Biased Just Because It Comes From A Set Top Box

73. Applying Simple Methods At ScaleHigh correlation of a7 measures and Nielsen estimates.Either bias is insignificant or Nielsen data and our data share the same bias.Multiple methods yield similar resultsRegression analysis of Nielsen Household Cume Rating against Simulmedia��s a7 cume rating. 20 Primetime Network shows with HAWAII FIVE-0. Fall 2010.Sources: Nielsen & Simulmedia��s a7

74. And Then We Kept GoingWe measured program Tune-In, Spot Tune-In, Campaign Reach, Campaign Rating using multiple slices of our data set using two different sample sets and time framesTwo samplesSample 1: Fall 2010: 20 Primetime broadcast series launches + promosSample 2: Jan 2011: 15 Primetime cable series premieres + promos (Plus one multi-season/year primetime broadcast premiere + promos)Hand selected programs

75. Mix of genres

76. Mix of new vs. returning showsHow we sliced itEntire a7 data set

77. Cross correlated individual data sets contained in a7 aggregate data set

78. Aggregate cross geographies (DMA to DMA)ObservationsSample 1 average r2>0.85

79. Sample 2 average r2>0.93Addressability Is Here

80. Closing The Loop On Program PromotionSpring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.Sources: Simulmedia��s a7

81. Closing The Loop On Program PromotionSpring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.Sources: Simulmedia��s a7

82. Closing The LoopLong held beliefs and rules of thumb in planning may or may not be supported by dataTV marketers now have more options for show promotion

83. Nielsen��s Ratings Are Good (Surprisingly Good)

84. Time Series: Broadcast: CBSHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) 60 networks. High correlation between Nielsen large sample measurement and a7 measuresSources: Nielsen & Simulmedia��s a7

85. Time Series: Broadcast: FoxHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

86. Time Series: Broadcast: ABCHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

87. Time Series: Cable: Investigation DiscoveryHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

88. Time Series: Cable: GolfHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

89. Time Series: Cable: BravoHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

90. Time Series: Cable: ESPN2Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

91. Time Series: Cable: SpeedHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

92. ��ܳ١�

93. When You Look CloserHour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia��s a7

94. High Frequency Time Series: ABC FamilyVolatility in dayparts, low rated networks, demographics��. Unrated networks ��don��t exist.�� Did NOT look at local.a7NielsenSample graph from High Frequency (Second and Minute level) Time Series Analysis of 45 networks on January 19th2011. Simulmedia a7Sample (Second by Second to Minute) Nielsen Sample (Minute by Minute) Sources: Nielsen & Simulmedia��s a7

95. Women Are More Different Than Men

96. Gender Driven Geographic VariationViewing by zip code among women across markets is more varied than men in the same zip codesMen 18-54Women 18-54Fraction of view time for ages 18-54 as fraction of view time for all TV viewers. Week 2 vs. the same fraction for week 1 (last two weeks in January).?Three markets: Philadelphia (blue) Atlanta (red) and Chicago (green)?Each point represents a zip code in one of these markets.?Source: Simulmedia��s a7

97. Gender Driven Geographic VariationPlanning tactics for female targeted campaigns should be different than male target campaignsPS��Also a good case for geo based creative versioning

98. Privacy Matters

99. 59Privacy By DesignAll marketing data companies need to care

100. Make consumer privacy protection part of the business from the beginning

101. Anonymous, aggregated data only

102. No personal data or data that can be related to particular individuals or devices

103. Broad marketing segmentations, not profiling

104. No sensitive dataDon��t be creepy

105. Mass Reach Is Indiscriminant

106. Fragmentation Effects On FrequencyEach segment was above 70% reach but the frequency distribution was nearly identicalPercent of audience reached for major animated motion picture campaign 2011. Two weeks prior to release. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment. Source: Nielsen & Simulmedia��s a7

107. Fragmentation Effects On FrequencyFragmentation is affecting all high reach campaigns.Percent of audience reached for insurance advertisers September to October 2010. Approximately 8000 ads. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment. Source: Nielsen & Simulmedia��s a7

108. Fragmentation Effects On FrequencyThe TV advertising market can��t continue to support this

109. 40% Of The Audience Is Getting 85% Of The Impressions

110. Fragmentation Rears It��s Head Again Campaign impressions increasingly concentrated against heavy viewers.0.0% 0.0 Total US Television Audience1.4 3.6% 4.3 10.8% Percent of audience reached for a different major animated motion picture campaign 2011. Two weeks prior to release. The stacked bar represents quintiles. Blue labels are average frequency per respective quintile. Red labels are % of total campaign impressions by respective quintile.23.0% 9.1 62.6% 24.8 Average Frequency Per Quintile% of Total Impressions Per QuintileSource: Nielsen & Simulmedia��s a7

111. Fragmentation Effects on FrequencyAdvertisers won��t continue to support this

112. What Happens Next?

113. ChoicesIf fragmentation is causing declining campaign reach and frequency imbalances, marketers must make choices.

114. Reduce reach

115. Do nothing

116. Use other channels

117. Stabilize or improve reach

118. Re-aggregate audiences using big dataWhat do you think?

119. Jack Smithjack@simulmedia.com@simulmedia@jkellonsmith

120. About Our Science TeamKrishna Balasubramanian, Chief Scientist

121. Previously: Chief Scientist, Tacoda. Chief Scientist, Real Media.

122. Doctoral Candidate, Physics. (Condensed Matter Physics) The Ohio State University

123. MS, Computer & Information Systems. The Ohio State University

124. MSc, Physics. Indian Institute of Technology, Kanpur

125. Yuliya Torosjan, Scientist

126. Previously: Clinical Research (Brain Imaging), Mount Sinai College of Medicine

127. MA, Statistics. Columbia University

128. BSE, Computer Science & Engineering. University of Pennsylvania

129. BA, Psychology. University of Pennsylvania

130. Mario Morales, Scientist

131. Previously: Lecturer, Bioinformatics, New York University. Senior Consultant, Weiser LLP.

132. MS, Statistics. Hunter College

133. MS, Bioinformatics. New York University

134. Dr. Sidd Mukherjee, Scientist

135. Previously, Visiting Scholar (Atomic Scattering experiments), The Ohio State University

136. Post doctoral research, Heat capacity of Helium-4. Pennsylvania State University

137. PhD, Physics. (Thesis: Measurements of Diffuse and Specular Scattering of 4He Atoms from 4He Films), Ohio State University

138. MS, Computer &Information Systems. The Ohio State University

139. BSc, Physics & Mathematics. University of Bombay

�ݺ�ߣ

Early Lessons Learned in Applying Big Data To TV Advertising

More Related Content

Early Lessons Learned in Applying Big Data To TV Advertising

Editor's Notes