際際滷

際際滷Share a Scribd company logo
Bringing parliamentary debates to the Semantic Web

Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1

1 Delft   University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb



DERIVE 2012
Boston, 12.11.2012.
Motivation




  Cross-media comparison:
 What choices do different media make in the coverage of people and topics while
  reporting on political events?

 Does the representation of topicsand people change over time and how do the
  various media types differ?
Motivation




                                         Political events
Media


  Cross-media comparison:
 What choices do different media make in the coverage of people and topics while
  reporting on political events?

 Does the representation of topicsand people change over time and how do the
  various media types differ?
Background: the
PoliMedia project

   Funded by CLARIN-NL

   May 2012 - May 2013

   3 phases :
     I. modeling phase: creating
        a semantic model (this
        presentation)
     II. data production phase:
         creating links between
         political events and media
     III.application phase:
        searching and navigating
        linked datasets
   www.polimedia.nl
Research questions

 How to represent political events on the Semantic Web?
 How to represent links between media and political events on
  the Semantic Web?
Research questions

 How to represent political events on the Semantic Web?
 How to represent links between media and political events on
  the Semantic Web?
Political events data set

 Events: Dutch parliamentary debates

 Handelingen der Staten-General or Dutch Hansard


 Some provenance:
  1. Transcripts are made of the complete
     debates of the Dutch parliament.
  2. Published online by the government on
     http://www.statengeneraaldigitaal.nl/ (1818
     1995) and http://
     of鍖cielebekendmakingen.nl/ (from 1995)
  3. PoliticalMashup project has translated
     government pdf and txt 鍖les into XML, incl
     URIs as identi鍖ers, see http://
     politicalmashup.nl/
  4. We build on that.
Media data sets

 newspaper articles and radio bulletins

     at the National Library of the Netherlands

     Many, mostly regional news papers 1950-
      1995

     Text + images of newspaper layout

 newscasts

     at the Netherlands institute for Sound and
      Vision

     evening news and current affairs
      programs

     metadata in Dublin Core and CDMI format

     enriched with thesaurus terms from the
      Gemeenschappelijke Thesaurus
      Audiovisuele Archieven (GTAA)
Semantic model: what do we need to represent? 1/2

 Important information for every parliamentary debate is:             Debate
     When the debate was held                                        Metadata
     What is being said in the debate (topics)
                                                                           Topic 1
     Who is giving the speeches in the debate and in which
      role (persons)
                                                                     Speaker 1 / Content
         Additional information about actors involved in the
          event (names of the politicians, their party, age, etc.)
                                                                     Speaker 2 / Content
     Structure: Subparts of the debate have their own
      identi鍖ers (part of the debate where only one speaker
      can be identi鍖ed as actor)                                     Speaker 3 / Content
         chronological order (the order in which the subparts
          where occurring inside the parliament debate,

     Named entities apart from politicians (persons,                      Topic 2
      locations, etc.)
                                                                     Speaker 1 / Content
Semantic model: what do we need to represent? 2/2




                          Various information about media
                           items linked to the debate

                          Links between subparts of the
                           debate and news articles, radio
                           bulletins and television newscasts
URIs

 PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech

 Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel

 debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d.
  198219830000846.2.11.12

 Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd:
  010069811:mpeg21:pdf
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model   W.R. van Hage, V. Malais辿, R.
                 Segers, L. Hollink and A.Th.
                 Schreiber. Design and use of
                 the Simple Event Model
                 (SEM)
Semantic model   W.R. van Hage, V. Malais辿, R.
                 Segers, L. Hollink and A.Th.
                 Schreiber. Design and use of
                 the Simple Event Model
                 (SEM)
Current work: 鍖nding links

 Queries: speaker name + named entities + topics (created using
  topic modeling methods) extracted from political events dataset
 used for retrieval of media articles




         TopicList   =
           NamedEntitiesVector   TopicWordSetVector   NamedEntitiesVector   TopicWordSetVector
               Speech                  Speech           PartOfDebate           PartOfDebate



           +
         Speaker X       =
            ActorFromSpeech                                                                      TimeFrame
Finally

   SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard
    data will be available soon.

   Feel free to use it!

   Links to media + search/browse app are expected early next year.
Thank you for your
                  attention!




  Henri Beunders (EUR)         Damir Juric (TU Delft)
     Jaap Blom (NISV)          Max Kemman (EUR)
     Laura Hollink (VU)        Martijn Kleppe (EUR)
Geert-Jan Houben (TU Delft)    Johan Oomen (NISV)

More Related Content

Bringing parliamentary debates to the Semantic Web

  • 1. Bringing parliamentary debates to the Semantic Web Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1 1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb DERIVE 2012 Boston, 12.11.2012.
  • 2. Motivation Cross-media comparison: What choices do different media make in the coverage of people and topics while reporting on political events? Does the representation of topicsand people change over time and how do the various media types differ?
  • 3. Motivation Political events Media Cross-media comparison: What choices do different media make in the coverage of people and topics while reporting on political events? Does the representation of topicsand people change over time and how do the various media types differ?
  • 4. Background: the PoliMedia project Funded by CLARIN-NL May 2012 - May 2013 3 phases : I. modeling phase: creating a semantic model (this presentation) II. data production phase: creating links between political events and media III.application phase: searching and navigating linked datasets www.polimedia.nl
  • 5. Research questions How to represent political events on the Semantic Web? How to represent links between media and political events on the Semantic Web?
  • 6. Research questions How to represent political events on the Semantic Web? How to represent links between media and political events on the Semantic Web?
  • 7. Political events data set Events: Dutch parliamentary debates Handelingen der Staten-General or Dutch Hansard Some provenance: 1. Transcripts are made of the complete debates of the Dutch parliament. 2. Published online by the government on http://www.statengeneraaldigitaal.nl/ (1818 1995) and http:// of鍖cielebekendmakingen.nl/ (from 1995) 3. PoliticalMashup project has translated government pdf and txt 鍖les into XML, incl URIs as identi鍖ers, see http:// politicalmashup.nl/ 4. We build on that.
  • 8. Media data sets newspaper articles and radio bulletins at the National Library of the Netherlands Many, mostly regional news papers 1950- 1995 Text + images of newspaper layout newscasts at the Netherlands institute for Sound and Vision evening news and current affairs programs metadata in Dublin Core and CDMI format enriched with thesaurus terms from the Gemeenschappelijke Thesaurus Audiovisuele Archieven (GTAA)
  • 9. Semantic model: what do we need to represent? 1/2 Important information for every parliamentary debate is: Debate When the debate was held Metadata What is being said in the debate (topics) Topic 1 Who is giving the speeches in the debate and in which role (persons) Speaker 1 / Content Additional information about actors involved in the event (names of the politicians, their party, age, etc.) Speaker 2 / Content Structure: Subparts of the debate have their own identi鍖ers (part of the debate where only one speaker can be identi鍖ed as actor) Speaker 3 / Content chronological order (the order in which the subparts where occurring inside the parliament debate, Named entities apart from politicians (persons, Topic 2 locations, etc.) Speaker 1 / Content
  • 10. Semantic model: what do we need to represent? 2/2 Various information about media items linked to the debate Links between subparts of the debate and news articles, radio bulletins and television newscasts
  • 11. URIs PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d. 198219830000846.2.11.12 Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd: 010069811:mpeg21:pdf
  • 17. Semantic model W.R. van Hage, V. Malais辿, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
  • 18. Semantic model W.R. van Hage, V. Malais辿, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
  • 19. Current work: 鍖nding links Queries: speaker name + named entities + topics (created using topic modeling methods) extracted from political events dataset used for retrieval of media articles TopicList = NamedEntitiesVector TopicWordSetVector NamedEntitiesVector TopicWordSetVector Speech Speech PartOfDebate PartOfDebate + Speaker X = ActorFromSpeech TimeFrame
  • 20. Finally SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard data will be available soon. Feel free to use it! Links to media + search/browse app are expected early next year.
  • 21. Thank you for your attention! Henri Beunders (EUR) Damir Juric (TU Delft) Jaap Blom (NISV) Max Kemman (EUR) Laura Hollink (VU) Martijn Kleppe (EUR) Geert-Jan Houben (TU Delft) Johan Oomen (NISV)