際際滷

際際滷Share a Scribd company logo
PoliticalMashup                                            1




                     PoliticalMashup
  Connecting promises and actions of politicians and how
               the society reacts on them

                             Maarten Marx

                      Universiteit van Amsterdam

                  Groningen, 留-informatica, 2011-03-11
PoliticalMashup                                 2



                            Content

 Overview PoliticalMashup project

 Zooming in on one cultural heritage dataset

 A few example applications

 Research ideas for NLP-scientists.
PoliticalMashup                                   3



                           Who am I?


 Political scientist turned computer scientist

 My 鍖eld:
   Theory of XML Database Systems
   Semi Structured Information Retrieval

 Cooperation with
   Tweede Kamer
   Koninklijke Bibliotheek,
   historians at NIOD, DNPP
PoliticalMashup                                         4



                  PoliticalMashup project

 Large scale data integration project

 2 years NWO funded infrastructure project 2010-2012

 Partners: U. Amsterdam, Groningen and Tilburg

 Ongoing with irregular funding since 2008
PoliticalMashup                                                  5



                  Goal of PoliticalMashup

 Making huge amounts of textual data available for

 large scale automatic quantitative data and content analysis

 done by scientists from the humanities and social sciences.
PoliticalMashup                                          6



                     Mashup of what and how?

 4 data sources
        Promises and actions of politicians
        Reactions on those in media and general public

 Connect data on
        Political entities
        Time
        Topics
PoliticalMashup                                               7



                          Data sources

Promises
     Election manifestos, mostly scans, DNPP
     Party websites and blogs, Archipol
     Twitter of politicians

Actions Parliamentary proceedings, mostly scans, KB

Reactions
     News media
     User generated content Fora, Blogs, Comments on news,
      Twitter
PoliticalMashup                                       8



                      Used techniques

 Text analytics and XML DB and IR technology

 Named entity recognition and normalization

 Data mining, Machine Learning, hand-crafted rules

 Natural Language Processing, Language Models


 Make implicit structure and information explicit.
PoliticalMashup                                9



                  Zoom in on one data corpus
PoliticalMashup                                      10



                     Longitudinal data

 weakly measurement for over 150 years

 very stable measurement procedure and data model
PoliticalMashup                                11



                  Data about human behaviour
PoliticalMashup                         12



                  Often rather boring
PoliticalMashup                                       13



         But sometimes full of drama and excitement
PoliticalMashup                                                       14



                       Loads of measurement points

                  24.000 days, 450.000 topics, 7.5 miljoen speeches
PoliticalMashup                         15



                  Digitally available
PoliticalMashup                                      16



         De Handelingen der Staten Generaal (Dutch
                        Hansards)
PoliticalMashup                                          17



                    About this collection

 very sparse available metadata

 very rich metadata sits hidden inside the raw data

 Rich data model
 Meeting (1 Day)
   Topic
     Stage direction
     Scene
      Stage direction
      Speech
       Paragraph
PoliticalMashup                               18



                  Same data: di鍖erent views

 Raw data in PDF

 XML styled with stylesheet

 Machine readable XML format
PoliticalMashup                               19



                  Some applications of this
PoliticalMashup                                                     20



                  Content and structure search

 Combine IR style keyword search with restrictions on structure.

 E.g., return speeches by Wilders about Islam
PoliticalMashup                                                   21



                  Exhaustive data collection

 Example query for NIOD historians

 Search for paragraphs about fascisme OR nazisme OR dictatuur
  OR (nazi AND dictatuur) OR . . .

 Return a tsv 鍖le with for each hit date speakername speakerid
  speaker-party . . .

 NIOD query
PoliticalMashup                                       22



                  Link the proceedings to entities

 Who is speaking?

 Who says what to whom?

Applications

 Summary of one speaker

 On old OCRed data: Linking and resolving entities
PoliticalMashup                                          23



       Application: Interruption graph (Attackogram)

 MP A interrupts B  A speaks during the block of B.
PoliticalMashup                         24



                  NLP research topics
PoliticalMashup                                        25



                            0) Topics

 Common European thesaurus http://eurovoc.europa.eu

 detection

 classi鍖cation (sentence, paragraph, speech level)
PoliticalMashup                                        26



                  1) Populist language in parliament

 PhD Thesis Jan Jagers (2006).
PoliticalMashup                                       27



 2) Automatically detecting promises (toezegging)
            by ministers in Parliament

 https:
  //zoek.officielebekendmakingen.nl/kst-103196.pdf
  (pagina 56)

 Eerste Kamer has a nice database online
  http://www.eerstekamer.nl/toezeggingen_2
PoliticalMashup                                                          28



                             Example

De voorzitter: Ik constateer dat wij bijna aan het einde van deze
vergadering zijn gekomen. Wij hebben nog tijd om even de
toezeggingen langs te lopen. Ik vraag iedereen om op te letten of er
niets over het hoofd is gezien. Ik zal dit snel doen en daarna spreken
wij nog even over het vervolg. De toezeggingen.
Na de zomer ligt het wetsvoorstel bij de Kamer.
Er komt een brief om de Kamer erover te informeren op welke wijze
er voorkomen wordt dat er expertise verloren gaat.
Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niet
toegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niet
toegezegd.
PoliticalMashup                                                      29



                    3) Opinion detection

 Detect opinions expressed about entities and topics. (Speaker is
  known)

 Detect reported speech.
PoliticalMashup                                                         30



                  4) Detect type of speech

 Interruption, attack, answer, speech (betoog), stage-direction,
  ...

 http://data.politicalmashup.nl/debates/nl/
  h-ek-19961997-37-58.1-tijdslijn.html
PoliticalMashup                               31



                       5) Detect bullshit

 Tautologi即en . . .
           e

 Regels zijn regels, Op is op

 pp

 het is wat het is
PoliticalMashup                                              32



                  6) Spelling normalization

 Dutch had many spelling reforms.

 Leads to lower recall.

 Search in new spelling, return results in old spellings.
PoliticalMashup                                                     33



                  Lots of data available: happy to share

 Now: 15 years of Dutch Parliamentary Proceedings in rich XML

 Now: 200 years more in poorer XML, slowly getting richer.

 Parliamentary proceedings from EU (15y), UK (75y), Spain (40y),
  Scandinavian countries, . . .

 Election manifestos (provincial elections 2007 and 2011)

 All tweets, blogs, Flickr and Youtube of all Dutch national
  politicians since 1.5 year.
PoliticalMashup                        34



                      Thanks




                  maartenmarx@uva.nl

More Related Content

Similar to Groningen nl pgroep (20)

Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
MaxKemman
Using open datasets for research purposes
Using open datasets for research purposesUsing open datasets for research purposes
Using open datasets for research purposes
Martijn Kleppe
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
Laura Hollink
Sense4us PACITA event presentation
Sense4us PACITA event presentationSense4us PACITA event presentation
Sense4us PACITA event presentation
SENSE4US project
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Tuukka Yl辰-Anttila
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
Edwin de Jonge
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
J T "Tom" Johnson
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
Piet J.H. Daas
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
Miriam Fernandez
WeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineWeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens Online
Timo Wandhoefer
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
wkwsci-research
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Piet J.H. Daas
voting advice slides
 voting advice slides voting advice slides
voting advice slides
maartenmarx
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
Diana Maynard
Library IT in DK
Library IT in DK Library IT in DK
Library IT in DK
Bo Fristed
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
AkhirulAminulloh2
AMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationAMIA 2017 - Data visualisation
AMIA 2017 - Data visualisation
NickRichardson44
MACE 2012 Assignment Strategy
MACE 2012 Assignment StrategyMACE 2012 Assignment Strategy
MACE 2012 Assignment Strategy
Cindy Chang
Introduction to Research project PoliMedia
Introduction to Research project PoliMediaIntroduction to Research project PoliMedia
Introduction to Research project PoliMedia
Martijn Kleppe
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
Roi Blanco
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
MaxKemman
Using open datasets for research purposes
Using open datasets for research purposesUsing open datasets for research purposes
Using open datasets for research purposes
Martijn Kleppe
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
Laura Hollink
Sense4us PACITA event presentation
Sense4us PACITA event presentationSense4us PACITA event presentation
Sense4us PACITA event presentation
SENSE4US project
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Tuukka Yl辰-Anttila
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
Edwin de Jonge
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
J T "Tom" Johnson
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
Piet J.H. Daas
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
Miriam Fernandez
WeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineWeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens Online
Timo Wandhoefer
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
wkwsci-research
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Piet J.H. Daas
voting advice slides
 voting advice slides voting advice slides
voting advice slides
maartenmarx
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
Diana Maynard
Library IT in DK
Library IT in DK Library IT in DK
Library IT in DK
Bo Fristed
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
AkhirulAminulloh2
AMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationAMIA 2017 - Data visualisation
AMIA 2017 - Data visualisation
NickRichardson44
MACE 2012 Assignment Strategy
MACE 2012 Assignment StrategyMACE 2012 Assignment Strategy
MACE 2012 Assignment Strategy
Cindy Chang
Introduction to Research project PoliMedia
Introduction to Research project PoliMediaIntroduction to Research project PoliMedia
Introduction to Research project PoliMedia
Martijn Kleppe
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
Roi Blanco

More from maartenmarx (11)

Ilja state2014expressivity
Ilja state2014expressivityIlja state2014expressivity
Ilja state2014expressivity
maartenmarx
Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13
maartenmarx
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
maartenmarx
Economie van de aandacht
  Economie van de aandacht  Economie van de aandacht
Economie van de aandacht
maartenmarx
Dans dataprijs2012
Dans dataprijs2012Dans dataprijs2012
Dans dataprijs2012
maartenmarx
College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08
maartenmarx
Presentation at NLDB 2012
Presentation at NLDB 2012Presentation at NLDB 2012
Presentation at NLDB 2012
maartenmarx
Women in Dutch parliament: what they did
Women in Dutch parliament: what they didWomen in Dutch parliament: what they did
Women in Dutch parliament: what they did
maartenmarx
Namescape 2012 03 06
Namescape 2012 03 06Namescape 2012 03 06
Namescape 2012 03 06
maartenmarx
TV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaalTV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaal
maartenmarx
Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10
maartenmarx
Ilja state2014expressivity
Ilja state2014expressivityIlja state2014expressivity
Ilja state2014expressivity
maartenmarx
Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13Haagse Hogeschool 2012-09-13
Haagse Hogeschool 2012-09-13
maartenmarx
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
Expertmeeting, E-humanities en politieke geschiedenis, Nijmegen, 2013-09-13
maartenmarx
Economie van de aandacht
  Economie van de aandacht  Economie van de aandacht
Economie van de aandacht
maartenmarx
Dans dataprijs2012
Dans dataprijs2012Dans dataprijs2012
Dans dataprijs2012
maartenmarx
College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08College sicco van-sas-2012_10_08
College sicco van-sas-2012_10_08
maartenmarx
Presentation at NLDB 2012
Presentation at NLDB 2012Presentation at NLDB 2012
Presentation at NLDB 2012
maartenmarx
Women in Dutch parliament: what they did
Women in Dutch parliament: what they didWomen in Dutch parliament: what they did
Women in Dutch parliament: what they did
maartenmarx
Namescape 2012 03 06
Namescape 2012 03 06Namescape 2012 03 06
Namescape 2012 03 06
maartenmarx
TV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaalTV-slant presentatie_politicologen_etmaal
TV-slant presentatie_politicologen_etmaal
maartenmarx
Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10Screen biographischportaal2010 12-10
Screen biographischportaal2010 12-10
maartenmarx

Groningen nl pgroep

  • 1. PoliticalMashup 1 PoliticalMashup Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Groningen, 留-informatica, 2011-03-11
  • 2. PoliticalMashup 2 Content Overview PoliticalMashup project Zooming in on one cultural heritage dataset A few example applications Research ideas for NLP-scientists.
  • 3. PoliticalMashup 3 Who am I? Political scientist turned computer scientist My 鍖eld: Theory of XML Database Systems Semi Structured Information Retrieval Cooperation with Tweede Kamer Koninklijke Bibliotheek, historians at NIOD, DNPP
  • 4. PoliticalMashup 4 PoliticalMashup project Large scale data integration project 2 years NWO funded infrastructure project 2010-2012 Partners: U. Amsterdam, Groningen and Tilburg Ongoing with irregular funding since 2008
  • 5. PoliticalMashup 5 Goal of PoliticalMashup Making huge amounts of textual data available for large scale automatic quantitative data and content analysis done by scientists from the humanities and social sciences.
  • 6. PoliticalMashup 6 Mashup of what and how? 4 data sources Promises and actions of politicians Reactions on those in media and general public Connect data on Political entities Time Topics
  • 7. PoliticalMashup 7 Data sources Promises Election manifestos, mostly scans, DNPP Party websites and blogs, Archipol Twitter of politicians Actions Parliamentary proceedings, mostly scans, KB Reactions News media User generated content Fora, Blogs, Comments on news, Twitter
  • 8. PoliticalMashup 8 Used techniques Text analytics and XML DB and IR technology Named entity recognition and normalization Data mining, Machine Learning, hand-crafted rules Natural Language Processing, Language Models Make implicit structure and information explicit.
  • 9. PoliticalMashup 9 Zoom in on one data corpus
  • 10. PoliticalMashup 10 Longitudinal data weakly measurement for over 150 years very stable measurement procedure and data model
  • 11. PoliticalMashup 11 Data about human behaviour
  • 12. PoliticalMashup 12 Often rather boring
  • 13. PoliticalMashup 13 But sometimes full of drama and excitement
  • 14. PoliticalMashup 14 Loads of measurement points 24.000 days, 450.000 topics, 7.5 miljoen speeches
  • 15. PoliticalMashup 15 Digitally available
  • 16. PoliticalMashup 16 De Handelingen der Staten Generaal (Dutch Hansards)
  • 17. PoliticalMashup 17 About this collection very sparse available metadata very rich metadata sits hidden inside the raw data Rich data model Meeting (1 Day) Topic Stage direction Scene Stage direction Speech Paragraph
  • 18. PoliticalMashup 18 Same data: di鍖erent views Raw data in PDF XML styled with stylesheet Machine readable XML format
  • 19. PoliticalMashup 19 Some applications of this
  • 20. PoliticalMashup 20 Content and structure search Combine IR style keyword search with restrictions on structure. E.g., return speeches by Wilders about Islam
  • 21. PoliticalMashup 21 Exhaustive data collection Example query for NIOD historians Search for paragraphs about fascisme OR nazisme OR dictatuur OR (nazi AND dictatuur) OR . . . Return a tsv 鍖le with for each hit date speakername speakerid speaker-party . . . NIOD query
  • 22. PoliticalMashup 22 Link the proceedings to entities Who is speaking? Who says what to whom? Applications Summary of one speaker On old OCRed data: Linking and resolving entities
  • 23. PoliticalMashup 23 Application: Interruption graph (Attackogram) MP A interrupts B A speaks during the block of B.
  • 24. PoliticalMashup 24 NLP research topics
  • 25. PoliticalMashup 25 0) Topics Common European thesaurus http://eurovoc.europa.eu detection classi鍖cation (sentence, paragraph, speech level)
  • 26. PoliticalMashup 26 1) Populist language in parliament PhD Thesis Jan Jagers (2006).
  • 27. PoliticalMashup 27 2) Automatically detecting promises (toezegging) by ministers in Parliament https: //zoek.officielebekendmakingen.nl/kst-103196.pdf (pagina 56) Eerste Kamer has a nice database online http://www.eerstekamer.nl/toezeggingen_2
  • 28. PoliticalMashup 28 Example De voorzitter: Ik constateer dat wij bijna aan het einde van deze vergadering zijn gekomen. Wij hebben nog tijd om even de toezeggingen langs te lopen. Ik vraag iedereen om op te letten of er niets over het hoofd is gezien. Ik zal dit snel doen en daarna spreken wij nog even over het vervolg. De toezeggingen. Na de zomer ligt het wetsvoorstel bij de Kamer. Er komt een brief om de Kamer erover te informeren op welke wijze er voorkomen wordt dat er expertise verloren gaat. Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niet toegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niet toegezegd.
  • 29. PoliticalMashup 29 3) Opinion detection Detect opinions expressed about entities and topics. (Speaker is known) Detect reported speech.
  • 30. PoliticalMashup 30 4) Detect type of speech Interruption, attack, answer, speech (betoog), stage-direction, ... http://data.politicalmashup.nl/debates/nl/ h-ek-19961997-37-58.1-tijdslijn.html
  • 31. PoliticalMashup 31 5) Detect bullshit Tautologi即en . . . e Regels zijn regels, Op is op pp het is wat het is
  • 32. PoliticalMashup 32 6) Spelling normalization Dutch had many spelling reforms. Leads to lower recall. Search in new spelling, return results in old spellings.
  • 33. PoliticalMashup 33 Lots of data available: happy to share Now: 15 years of Dutch Parliamentary Proceedings in rich XML Now: 200 years more in poorer XML, slowly getting richer. Parliamentary proceedings from EU (15y), UK (75y), Spain (40y), Scandinavian countries, . . . Election manifestos (provincial elections 2007 and 2011) All tweets, blogs, Flickr and Youtube of all Dutch national politicians since 1.5 year.
  • 34. PoliticalMashup 34 Thanks maartenmarx@uva.nl