Dr. Piet Daas (CBS) - Statistiek en grote data bestandenAlmereDataCapital
油
Presentatie van Dr. Piet Daas (CBS): 'Statistiek en grote data bestanden' tijdens het Big Data Analytics seminar 14 juni van Almere DataCapital in Almere.
Big Data presentation for Statistics CanadaPiet J.H. Daas
油
The document discusses the integration of big data into official statistics, highlighting experiences from Statistics Netherlands with various data sources such as road sensors, mobile phone activity, and social media. It emphasizes the need for new skills and methodologies to handle big data challenges, including data editing, reduction, and dealing with unstructured data. Key lessons learned include the importance of understanding diverse data types, analytical capabilities, privacy issues, and the necessity for a data-driven mindset.
The document discusses the intersection of data science and statistics, particularly focusing on the evolving role of Statistics Netherlands in utilizing large datasets for statistical purposes. It highlights the skills required for data scientists, the shift from traditional statistical methods to more data-driven approaches, and the importance of new data sources like mobile and social media. Additionally, it addresses the challenges of handling vast amounts of data and the need for new visualization and analytical methods to extract meaningful insights.
This document summarizes the experiences of Statistics Netherlands with big data research. It discusses two types of data - primary data collected through surveys and secondary data from administrative sources and big data. It provides examples of big data research conducted using road sensor data, mobile phone data, and social media data. Lessons learned include the need for skills in accessing and analyzing large datasets, dealing with noisy unstructured data, and addressing privacy and costs. Important future research topics mentioned are profiling units in big data, data editing at large scale, and data reduction techniques.
Presentatie big data (Dag van de verkoper, Cevora) IntoTheMinds
油
Presentatie gegeven in Antwerpen en Gent of 30 Mei 2017 en 18 Mei 2017 over Big Data en verkoop.
In deze introductie werd de theorie over Big Data uitgelegd zoals voorbeelden van toepassingen om data te valoriseren. Speciaal aandacht werd gevestigd op juridische aspecten zoals GDPR.
Workshop gegeven door Esri NL en de Nationale Denktank tijdens het Innovatie Tafel Infrastructuur congres. Open Data en Big Data zijn de besproken thema's betreffende het thema mobiliteit
Tijdens dit webinar zal Eduvision ingaan op de rol van Big Data binnen de overheid. Bijbehorend bij webinar https://eduvision.nl/webinar/big-data-overheid
Big data voor rijksoverheid, gemeenten en andere publieke diensten. Nadruk op de werking en mogelijkheden van big data. Specifieke aandacht voor open data.
De Nationale DenkTank 2014 onderzocht de kansen, mogelijkheden en bedreigingen van big data. Op InfraTrends presenteren ze de belangrijkste resulaten die vervolgens hebben geleid tot concrete oplossingen met grote maatschappelijke impact. Zo verzamelt de applicatie 'Op weg met Data' gegevens van alle autos, waardoor infrastructuurbeheerders kunnen achterhalen wat de kwaliteit van de weg is, waar het glad is en waar files staan. Leer van ex-Denktankers Janne Verstappen en Rik Plender hoe data slim kan worden ingezet binnen een organisatie en hoe je om gaat met de bottlenecks.
Key-Note Big Data - In a Nutshell (Big Data symposium provincies april 2016)Erik Van Der Zee
油
Key-Note presentatie op het Big Data symposium voor gezamenlijke provincies op 13 april 2016, georganiseerd door Geonovum in samenwerking met het Provinciaal Platform GEO (PP-GEO).
In deze presentatie wordt een beknopt overzicht van Big Data. Het geeft een introductie van het begrip big data, en de relatie met andere begrippen als Smart Cities, Internet of Things, en Open Data. Daarnaast komen een aantal "kwesties" aan de orde die met big data verbonden zijn, zoals privacy & security, opleiding, standaarden en architectuur, en worden een aantal huidige toepassingen van Big Data binnen de Nederlandse overheid uitgelicht. De presentatie geeft ook een visie op hoe provincies een co旦rdinerende en verbindende rol kunnen vervullen bij de ontwikkeling van Smart Provincies, en geeft aan hoe het Platform Making Sense for Society van Geonovum daarbij kan ondersteunen
Big Data and official statistics with examples of their usePiet J.H. Daas
油
The document provides an overview of the work done by the Center for Big Data Statistics (CBDS) at Statistics Netherlands. It discusses several examples of using big data sources to produce official statistics:
1) Road sensor data was used to produce the first official big data-based statistics on traffic intensity and its correlation with GDP.
2) Mobile phone data was analyzed to produce statistics on mobility patterns, daytime population, and tourism.
3) AIS ship tracking data was analyzed to study ship movements and transhipment locations.
4) Web scraping and text analysis of company websites was used to identify innovative companies, including small companies not covered by traditional surveys.
5) Sentiment analysis and
IT infrastructure for Big Data and Data Science at Statistics NetherlandsPiet J.H. Daas
油
Statistics Netherlands is facing IT infrastructure challenges due to the growing demand for processing large datasets and integrating new data sources. Solutions being considered include both parallelized and non-parallelized processing options using big data analytics platforms, cloud services, and GPU computing. A new data center is planned to be established to accommodate both traditional and big data processing needs, incorporating various technological advancements.
Presentatie big data (Dag van de verkoper, Cevora) IntoTheMinds
油
Presentatie gegeven in Antwerpen en Gent of 30 Mei 2017 en 18 Mei 2017 over Big Data en verkoop.
In deze introductie werd de theorie over Big Data uitgelegd zoals voorbeelden van toepassingen om data te valoriseren. Speciaal aandacht werd gevestigd op juridische aspecten zoals GDPR.
Workshop gegeven door Esri NL en de Nationale Denktank tijdens het Innovatie Tafel Infrastructuur congres. Open Data en Big Data zijn de besproken thema's betreffende het thema mobiliteit
Tijdens dit webinar zal Eduvision ingaan op de rol van Big Data binnen de overheid. Bijbehorend bij webinar https://eduvision.nl/webinar/big-data-overheid
Big data voor rijksoverheid, gemeenten en andere publieke diensten. Nadruk op de werking en mogelijkheden van big data. Specifieke aandacht voor open data.
De Nationale DenkTank 2014 onderzocht de kansen, mogelijkheden en bedreigingen van big data. Op InfraTrends presenteren ze de belangrijkste resulaten die vervolgens hebben geleid tot concrete oplossingen met grote maatschappelijke impact. Zo verzamelt de applicatie 'Op weg met Data' gegevens van alle autos, waardoor infrastructuurbeheerders kunnen achterhalen wat de kwaliteit van de weg is, waar het glad is en waar files staan. Leer van ex-Denktankers Janne Verstappen en Rik Plender hoe data slim kan worden ingezet binnen een organisatie en hoe je om gaat met de bottlenecks.
Key-Note Big Data - In a Nutshell (Big Data symposium provincies april 2016)Erik Van Der Zee
油
Key-Note presentatie op het Big Data symposium voor gezamenlijke provincies op 13 april 2016, georganiseerd door Geonovum in samenwerking met het Provinciaal Platform GEO (PP-GEO).
In deze presentatie wordt een beknopt overzicht van Big Data. Het geeft een introductie van het begrip big data, en de relatie met andere begrippen als Smart Cities, Internet of Things, en Open Data. Daarnaast komen een aantal "kwesties" aan de orde die met big data verbonden zijn, zoals privacy & security, opleiding, standaarden en architectuur, en worden een aantal huidige toepassingen van Big Data binnen de Nederlandse overheid uitgelicht. De presentatie geeft ook een visie op hoe provincies een co旦rdinerende en verbindende rol kunnen vervullen bij de ontwikkeling van Smart Provincies, en geeft aan hoe het Platform Making Sense for Society van Geonovum daarbij kan ondersteunen
Big Data and official statistics with examples of their usePiet J.H. Daas
油
The document provides an overview of the work done by the Center for Big Data Statistics (CBDS) at Statistics Netherlands. It discusses several examples of using big data sources to produce official statistics:
1) Road sensor data was used to produce the first official big data-based statistics on traffic intensity and its correlation with GDP.
2) Mobile phone data was analyzed to produce statistics on mobility patterns, daytime population, and tourism.
3) AIS ship tracking data was analyzed to study ship movements and transhipment locations.
4) Web scraping and text analysis of company websites was used to identify innovative companies, including small companies not covered by traditional surveys.
5) Sentiment analysis and
IT infrastructure for Big Data and Data Science at Statistics NetherlandsPiet J.H. Daas
油
Statistics Netherlands is facing IT infrastructure challenges due to the growing demand for processing large datasets and integrating new data sources. Solutions being considered include both parallelized and non-parallelized processing options using big data analytics platforms, cloud services, and GPU computing. A new data center is planned to be established to accommodate both traditional and big data processing needs, incorporating various technological advancements.
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
油
1. The documents discuss methodology, quality, and IT aspects of big data within the ESSnet Big Data project.
2. Key topics addressed include the big data processing lifecycle, metadata management challenges, and quality aspects like coverage, accuracy, and comparability over time.
3. Common themes that emerged across work packages include the need for a unified framework for data integration and metadata, and the value of shared software and training resources.
Piet Daas and Marco Puts from Statistics Netherlands presented on big data methods and techniques. They discussed the four phases of working with big data: collect, process, analyze, and disseminate. They provided examples of each phase using road sensor data to measure traffic, scraping company websites to identify innovative firms, and using aerial images to detect solar panels. They emphasized the need to preprocess and clean big data due to its noisy nature. When analyzing big data, they discussed dealing with imbalanced datasets, such as through oversampling rare cases. They concluded by showing examples of visualizing big data results as dot maps and animations.
Use of social media for official statisticsPiet J.H. Daas
油
The document discusses the challenges and potential of using social media data for producing official statistics, highlighting issues like response burden and decreasing response rates. It outlines various applications, including sentiment analysis, measuring social tension, and identifying individuals' housing intentions, emphasizing the importance of data noise reduction and population accuracy. Additionally, it mentions the need for developing models to effectively utilize this rich data source for statistical purposes.
Isi 2017 presentation on Big Data and biasPiet J.H. Daas
油
1) The document discusses three types of using big data in statistics: (1) combined with survey data, (2) from a single complete source, and (3) from a single incomplete source.
2) Examples of type 2 include road sensor traffic data and web-scraped price data. These sources completely cover their target populations.
3) Examples of type 3 include social media data and mobile phone data. Only part of the target population is included, so ways must be found to deal with the missing part, such as determining the characteristics of the included population.
Responsible Data Science at Statistics NetherlandsPiet J.H. Daas
油
Piet Daas presents on responsible data science at Statistics Netherlands and implications for big data research. Some key points:
- Statistics Netherlands uses a variety of administrative data and surveys in its Social Statistical Database to produce statistics, ensuring privacy through anonymization and access restrictions.
- The Center for Big Data Statistics aims to produce new real-time statistics using big data sources while reducing data collection burdens and advancing methodology. Challenges include a lack of established big data methods and ensuring transparency of models.
- Responsible data science principles of fairness, accuracy, confidentiality and transparency must be further developed to fully leverage big data's potential while preventing harms, such as through de-identification and model explainability.
CBS lecture at the opening of Data Science Campus of ONSPiet J.H. Daas
油
The document summarizes work done at the Center for Big Data Statistics, including case studies and methodological research. Some examples of projects are:
1) Visualizing income data in 2D and 3D heat maps showing relationships between age, income, and amount.
2) Analyzing road sensor data to show relationships between traffic intensity and GDP.
3) Tracking "ginger bread" product sales from scanner data around Saint Nicolas festivities.
4) Developing a social tension indicator using Twitter data.
5) Identifying web-only shops and innovative companies using web page archives.
The document outlines goals for a data-driven society, including the creation of real-time statistics and the integration of various data sources to enhance understanding of big data methodology while addressing privacy concerns. It presents a working program for 2017 focusing on mobility, tourism, and the innovative use of big data through experimental products and partnerships. Examples of beta products demonstrate applications like social unrest indicators and traffic intensity analysis, showcasing the potential for utilizing big data in official statistics.
1) Statistics Netherlands is working on several Big Data projects to produce new official statistics in a timely manner using large alternative data sources like road sensors.
2) Their Center for Big Data Statistics aims to reduce response burden, deepen methodological knowledge, and stimulate cooperation using an ecosystem of partners.
3) As a proof of concept, they have produced the first Big Data-based official statistic on regional traffic intensity using minute-level road sensor data from 20,000 sensors on Dutch highways. This required data cleaning, transformation, estimation techniques, and integrating skills from statistics, IT, and subject-matter expertise.
Extracting information from ' messy' social media dataPiet J.H. Daas
油
This document summarizes research conducted by Statistics Netherlands on using social media data for official statistics. It discusses (1) determining sentiment in social media and its correlation with consumer confidence surveys, (2) developing an initial social media-based (un)safety monitor, and (3) analyzing the composition of social media users to determine background characteristics like gender. Key findings include high correlation between sentiment and surveys, the ability to predict consumer confidence from sentiment, and developing accurate methods to classify user gender using multiple social media signals.
Profiling Big Data sources to assess their selectivityPiet J.H. Daas
油
This document discusses profiling Big Data sources to assess their selectivity. It analyzes a random sample of 1,000 Dutch Twitter users to determine gender selectivity. Several methods are used to infer gender from profile elements: (1) First names are analyzed using a Dutch name database, (2) Bios and tweets are examined for gendered language, (3) Pictures are processed with face recognition software. Overall results show first names provided the highest diagnostic odds ratio for determining gender, while profile pictures provided the lowest. The study aims to develop clever ways to combine these methods for more accurate gender profiling of social media users.
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyPiet J.H. Daas
油
This document discusses using road sensor data for official statistics in the Netherlands. It describes challenges around dealing with large volumes of data, creating historical time series, and ensuring accuracy. A statistical process is outlined that cleans, transforms, selects, estimates from and frames the raw road sensor data, which records over 230 million vehicle counts per day. Key steps include selecting only necessary variables from valid data on main routes, putting daily records together, cleaning using recursive Bayesian estimation and a hidden Markov model, and estimating traffic indices from the cleaned data.
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
油
This document summarizes the experiences of Statistics Netherlands with big data. It discusses two types of data - primary data from their own surveys and secondary data from other sources like administrative records and big data. It provides examples of exploratory big data studies conducted using road sensor data, mobile phone data, and social media data. It finds that combining IT skills with statistical methodology is important for working with big data. Skills in data science, machine learning, and extracting information from diverse sources like text and images are needed. The document also discusses lessons learned regarding the types of big data, accessing and analyzing large volumes of data, dealing with noisy and unstructured data, and moving beyond simple correlation.
Quality challenges in modernising business statisticsPiet J.H. Daas
油
This document discusses quality challenges in modernizing official business statistics due to two fundamental changes: commercialization of statistics and globalization. It notes these changes have resulted in different statistics being needed and different ways of producing statistics, using available administrative data, secondary sources, and modernizing survey methodology. It evaluates the qualities and disqualities of various data sources like surveys, administrative data, and big data. Finally, it tentatively concludes statistical systems need to develop new indicators and integrate multiple data sources using their individual qualities, while also modernizing business surveys. The goal is to update multi-source and mixed-mode strategies for producing official statistics.
This document discusses quality approaches for big data in statistics. It outlines limitations of established quality frameworks for big data, including population not being known, unbalanced data coverage, and unclear relevance of data sources. Options presented to address these limitations include deriving background information, using modeling approaches, and calibration or correlation studies. The document advocates that statistical organizations validate information from other big data producers, get to know big data sources, use big data for efficiency and early indicators, and create an environment conducive to innovative big data approaches.
Social media sentiment and consumer confidencePiet J.H. Daas
油
This document summarizes a workshop on using big data for forecasting and statistics. It discusses using social media sentiment data from over 3.5 billion Dutch messages to analyze consumer confidence. Sentiment is determined from words and averages sentiment per time period. Facebook sentiment most strongly correlates with later consumer confidence figures. Sentiment from the first half of the month best predicts the consumer confidence published around the 20th, indicating sentiment may rapidly forecast consumer attitudes. While units differ, social media sentiment seems to track the "mood of the nation" and could provide a rapid indicator of consumer confidence.
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
油
1) The document discusses opportunities and challenges of using Big Data for official statistics. It describes Big Data as data that is difficult to collect, store, or process using conventional statistical systems due to issues of volume, velocity, structure, or variety.
2) The author outlines their experiences at Statistics Netherlands using various Big Data sources like traffic sensor data, mobile phone data, and social media data. They discuss methodological challenges in accessing and analyzing large volumes of data, dealing with noisy and unstructured data, and addressing issues of selectivity.
3) The document emphasizes the need for new skills like data science, high performance computing, and people with open and pragmatic mindsets to work with Big Data. It also addresses privacy
This document discusses using big data as a source for official statistics. It provides an overview of big data research at Statistics Netherlands and why visualization is used as an analysis tool. Some key challenges discussed include dealing with noisy and dirty data, addressing selectivity issues in big data sources, going beyond simple correlation, and addressing privacy and security concerns. Examples are provided of visualizing census and social security register data. The future potential of big data for statistics is acknowledged, though fundamental methodological, legal and technical issues still need resolution.
Comparative Genomics Methods And Protocols 1st Edition Joo C Setubalxbhamef758
油
Comparative Genomics Methods And Protocols 1st Edition Joo C Setubal
Comparative Genomics Methods And Protocols 1st Edition Joo C Setubal
Comparative Genomics Methods And Protocols 1st Edition Joo C Setubal
The Making Of Gratians Decretum 1st Edition Anders Winrothkutuskaine
油
The Making Of Gratians Decretum 1st Edition Anders Winroth
The Making Of Gratians Decretum 1st Edition Anders Winroth
The Making Of Gratians Decretum 1st Edition Anders Winroth
Concepts in Strategic Management and Business Policy Globalization Innovation...ewlalgp855
油
Concepts in Strategic Management and Business Policy Globalization Innovation and Sustainability 15th Edition Wheelen Solutions Manual
Concepts in Strategic Management and Business Policy Globalization Innovation and Sustainability 15th Edition Wheelen Solutions Manual
Concepts in Strategic Management and Business Policy Globalization Innovation and Sustainability 15th Edition Wheelen Solutions Manual
The Psychology of Spine Surgery 1st Edition Andrew R. Blockugcrokz061
油
The Psychology of Spine Surgery 1st Edition Andrew R. Block
The Psychology of Spine Surgery 1st Edition Andrew R. Block
The Psychology of Spine Surgery 1st Edition Andrew R. Block
The data warehouse toolkit the complete guide to dimensional modeling 2nd ed ...misjzdqlx0124
油
The data warehouse toolkit the complete guide to dimensional modeling 2nd ed Edition Ralph Kimball
The data warehouse toolkit the complete guide to dimensional modeling 2nd ed Edition Ralph Kimball
The data warehouse toolkit the complete guide to dimensional modeling 2nd ed Edition Ralph Kimball
Sociology A Down to Earth Approach 13th Edition Henslin Test Bankssdygsoq114
油
Sociology A Down to Earth Approach 13th Edition Henslin Test Bank
Sociology A Down to Earth Approach 13th Edition Henslin Test Bank
Sociology A Down to Earth Approach 13th Edition Henslin Test Bank
RECENT DEVELOPMENTS IN BIOENERGY RESEARCH Vijai G. Gupta (Editor)hsakvtrw562
油
RECENT DEVELOPMENTS IN BIOENERGY RESEARCH Vijai G. Gupta (Editor)
RECENT DEVELOPMENTS IN BIOENERGY RESEARCH Vijai G. Gupta (Editor)
RECENT DEVELOPMENTS IN BIOENERGY RESEARCH Vijai G. Gupta (Editor)
Foodborne Disease Handbook. Volume 2: Viruses, Parasites, Pathogens, and HACC...axswigb793
油
Foodborne Disease Handbook. Volume 2: Viruses, Parasites, Pathogens, and HACCP Y. H. Hui
Foodborne Disease Handbook. Volume 2: Viruses, Parasites, Pathogens, and HACCP Y. H. Hui
Foodborne Disease Handbook. Volume 2: Viruses, Parasites, Pathogens, and HACCP Y. H. Hui
A New Architecture For Functional Grammar Functional Grammar Series J Lachlan...ecnlxfkyv5483
油
A New Architecture For Functional Grammar Functional Grammar Series J Lachlan Mackenzie
A New Architecture For Functional Grammar Functional Grammar Series J Lachlan Mackenzie
A New Architecture For Functional Grammar Functional Grammar Series J Lachlan Mackenzie
The Ongoing Technological System Ait El Hadj Smalbhfekrzdq718
油
Ad
Bi dutch meeting data science
1. Enkele voorbeelden
Data Science, Big Data en de offici谷le
statistiek
Piet Daas, Edwin de Jonge, May Offermans, Martijn Tennekes
Alex Priem en Paul van den Hurk
2. Overzicht
2
Het CBS
Data en bronnen
Waarom Big Data & Data Science?
3Vs en uitdagingen
Voorbeelden
Virtuele volkstelling
Polisadministratie
Verkeerslussen
Mobiele telefoons
Sociale mediaberichten
3. Het CBS
Het CBS produceerde in 2012
ongeveer 5000 offici谷le publicaties
en tabellen
Daar hebben we DATADATA voor nodig!
3
4. Twee soorten databronnen
Primaire data Secondaire data
Onze eigen vragenlijsten Data van anderen
- Administratieve bronnen
- Nieuwe databronnen
6. Waarom Big Data?
Snel beschikbaar
Hoeveelheid
Complex/Lastig
Informatie extractie
Populatie en dynamiek
6
3Vs
7. Uitdagingen bij aanvang
Praktisch
Hoe komen we aan Big Data?
Waar en hoe doen we de analyses?
Juridisch
Mogen we dit?
Netjes werken: rekening houden met privacy gevoelige data (WBP)
Kosten
Het CBS betaald niet voor administratieve data.
En voor Big Data?
Methodologisch
Methoden nodig om grote hoeveelheden data te analyseren
Technisch
Leren van computational statistics gerelateerde onderzoeksgebieden
High Performance Computing technieken (parallelle verwerking)
Mensen
Hebben data scientists nodig: statistisch denkende mensen die
kunnen programmeren, nieuwsgierig zijn en:
buiten het traditionele steekproef paradigma kunnen denken!
7
8. Onderzoek door het CBS
Bevindingen onderzoek grote data bronnen
Visualisaties:
1) Virtuele Volkstelling (17 miljoen records)
2) Polisadministratie (20 miljoen records)
Big Data:
3) Verkeerslussen (100 miljoen records)
4) Mobiele telefonie (~500 miljoen records)
5) Sociale media (12 miljoen - 2 miljard records)
8
9. Voorbeeld 1. Virtuele Volkstelling
Volkstelling is verplicht, eens in 10-jaar
In Nederland niet meer met vragenlijsten
Laatste traditionele volkstelling in 1971
Nu door (her)gebruik van reeds verzamelde
informatie
Grootschalig koppelen van administratieve bronnen en
enqu棚tegegevens
Controleren van resultaat
Hoe?
Met een visualisatiemethode: Tableplot
9
10. Uitleg maken Tableplot
1. Bestand laden 17 miljoen records
2. Records sorteren op waarde 17 miljoen records
van sleutelvariabele
in dit geval leeftijd
3. Samenvoegen records 100 groepen (elk 170.000 records)
Numerieke variabelen
Bereken gemiddelde (gem. leeftijd)
Categoriale variabelen
Verhouding aanwezige categorie谷n (man vs vrouw)
4. Plaatje plotten van geselecteerd aantal variabelen
Kleurgebruik belangrijk tot 12
10
13. Voorbeeld 2: Polisadministratie
Bestand met de financi谷le gegevens van alle
banen, uitkeringen en pensioenen in Nederland
Verzameld door Belastingdienst en UWV
Elke maand 20 miljoen records
Hoe krijgen we inzicht in deze enorme bak
data?
Met een visualisatie: heat map
13
16. Voorbeeld 3: Verkeerslussen
Verkeerslussen
Elke minuut (24/7) wordt het aantal passerende
voertuigen op >10.000 meetpunten in Nederland
geteld
Totaal en in verschillende lengtecategorie谷n
Mooie bron om verkeer- en vervoer- statistieken
mee te maken (en meer)
Veel data, zon 100 miljoen records per dag
Locaties
16
27. Voorbeeld 4: Mobiele telefoons
Vrijwel elke Nederlander heeft een mobieltje
Bijna altijd bij zich en staat vrijwel altijd aan
Ideale informatiebron om:
Met behulp van gegevens van providers:
Verplaatsingsgedrag (Dag-populatie)
Toerisme (nieuwe aanmeldingen op netwerk)
Mensenmassas (bijv. bij evenementen)
27
28. Dag-populatie
Woonadres in GBA
- Waar personen snachts
verblijven
Wat doen ze overdag?
- Locatie van mobieltje bepalen
bij bel/sms/data actviteit a.h.v.
mastlocatie
Data van 辿辿n provider
- Data Dec 2012 en Jan 2013
- Eerste begin Dag-populatie
28
29. Voorbeeld 5: Sociale media
Nederlanders zijn erg actief op sociale media
Bijna altijd bij zich en staat vrijwel altijd aan
Steeds meer mensen hebben een smartphone!
Mogelijke informatiebron voor:
Welke onderwerpen zijn actueel:
Aantal berichten en sentiment hierover
Als meetinstrument te gebruiken voor:
.
Map by Eric Fischer (via Fast Company)
30. Sociale media: Nederlandstalige berichten
Nederlanders zijn erg actief op sociale media
Mogelijke informatiebron:
Aantal berichten over en sentiment t.a.v. bepaalde onderwerpen
(snel beschikbaar!)
Testen om nut en bruikbaarheid te controleren
a. Inhoud:
- Zelf NL Twitter-berichten verzameld: in totaal 12 miljoen
b. Sentiment
- Sentiment in NL-talige sociale mediaberichten bestudeerd: ~2 miljard
30
31. Sociale media: Twitter
Onderwerpen Twitter
Bijdrage (%)
0 10 20 30 40 50
Thema's
Overige
Media
Sport
Cultuur/events
Vakantie
Vrije tijd
Vervoer
Veiligheid
Politiek
Onderwijs
Gezondheid
ICT
Weer
Milieu
Economie
Wonen
Relaties
Werk
(46%)
(10%)
(7%)
(3%)
(5%)
12 miljoen berichten31
32. Sentiment in Sociale media
Toegang tot Coosto database gekocht
> 2 miljard publiek beschikbare NL-berichten
Twitter, Facebook, Hyves, Webfora, Blogs etc.
Sentiment van elk bericht
Positief, negatief of neutraal
Van alles geprobeerd
Interessante insteek
Gekeken naar Mood of the nation en vergeleken met
het Consumenten vertrouwen van het CBS
32
33. Consumenten vertrouwen, enqu棚te data
Sentiment t.a.v. het economisch klimaat
~1000 respondenten/maand
(posneg)as%oftotal(posneg)als%vantotaal
Tijd
33
34. Consumentenvertrouwen vs. sociale media
Corr: 0.88 ~25 miljoen berichten/maand
Sentiment t.a.v. het economisch klimaat &
In sociale mediaberichten(posneg)als%vantotaal
Tijd
34
35. Uitdagingen: Big Data en CBS
Juridisch
Routinematige toegang (niet alleen voor onderzoek)?
Goed uitzoeken
Praktisch
Gaan we alle (micro)data in huis analyseren?
Of bij de bronhouder of in de Cloud ?
Methodologisch
Big data bronnen registeren events
En zijn niet het gevolg van een steekproefontwerp
Grote behoefte aan theorievorming op dit terrein!
Mensen
Behoefte aan Data scientists op het CBS
Zijn er momenteel niet veel (opleiden?)
35