Presentation of the paper 'Bringing parliamentary debates to the Semantic Web' by Damir Juric, Laura Hollink and Geert-Jan Houben at the workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE2012) in conjunction with the 11th International Semantic Web Conference 2012 in Boston, USA.
See also the homepage of the PoliMedia project: http://polimedia.nl/
1 of 21
Download to read offline
More Related Content
Bringing parliamentary debates to the Semantic Web
1. Bringing parliamentary debates to the Semantic Web
Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1
1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb
DERIVE 2012
Boston, 12.11.2012.
2. Motivation
Cross-media comparison:
What choices do different media make in the coverage of people and topics while
reporting on political events?
Does the representation of topicsand people change over time and how do the
various media types differ?
3. Motivation
Political events
Media
Cross-media comparison:
What choices do different media make in the coverage of people and topics while
reporting on political events?
Does the representation of topicsand people change over time and how do the
various media types differ?
4. Background: the
PoliMedia project
Funded by CLARIN-NL
May 2012 - May 2013
3 phases :
I. modeling phase: creating
a semantic model (this
presentation)
II. data production phase:
creating links between
political events and media
III.application phase:
searching and navigating
linked datasets
www.polimedia.nl
5. Research questions
How to represent political events on the Semantic Web?
How to represent links between media and political events on
the Semantic Web?
6. Research questions
How to represent political events on the Semantic Web?
How to represent links between media and political events on
the Semantic Web?
7. Political events data set
Events: Dutch parliamentary debates
Handelingen der Staten-General or Dutch Hansard
Some provenance:
1. Transcripts are made of the complete
debates of the Dutch parliament.
2. Published online by the government on
http://www.statengeneraaldigitaal.nl/ (1818
1995) and http://
of鍖cielebekendmakingen.nl/ (from 1995)
3. PoliticalMashup project has translated
government pdf and txt 鍖les into XML, incl
URIs as identi鍖ers, see http://
politicalmashup.nl/
4. We build on that.
8. Media data sets
newspaper articles and radio bulletins
at the National Library of the Netherlands
Many, mostly regional news papers 1950-
1995
Text + images of newspaper layout
newscasts
at the Netherlands institute for Sound and
Vision
evening news and current affairs
programs
metadata in Dublin Core and CDMI format
enriched with thesaurus terms from the
Gemeenschappelijke Thesaurus
Audiovisuele Archieven (GTAA)
9. Semantic model: what do we need to represent? 1/2
Important information for every parliamentary debate is: Debate
When the debate was held Metadata
What is being said in the debate (topics)
Topic 1
Who is giving the speeches in the debate and in which
role (persons)
Speaker 1 / Content
Additional information about actors involved in the
event (names of the politicians, their party, age, etc.)
Speaker 2 / Content
Structure: Subparts of the debate have their own
identi鍖ers (part of the debate where only one speaker
can be identi鍖ed as actor) Speaker 3 / Content
chronological order (the order in which the subparts
where occurring inside the parliament debate,
Named entities apart from politicians (persons, Topic 2
locations, etc.)
Speaker 1 / Content
10. Semantic model: what do we need to represent? 2/2
Various information about media
items linked to the debate
Links between subparts of the
debate and news articles, radio
bulletins and television newscasts
11. URIs
PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech
Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel
debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d.
198219830000846.2.11.12
Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd:
010069811:mpeg21:pdf
17. Semantic model W.R. van Hage, V. Malais辿, R.
Segers, L. Hollink and A.Th.
Schreiber. Design and use of
the Simple Event Model
(SEM)
18. Semantic model W.R. van Hage, V. Malais辿, R.
Segers, L. Hollink and A.Th.
Schreiber. Design and use of
the Simple Event Model
(SEM)
19. Current work: 鍖nding links
Queries: speaker name + named entities + topics (created using
topic modeling methods) extracted from political events dataset
used for retrieval of media articles
TopicList =
NamedEntitiesVector TopicWordSetVector NamedEntitiesVector TopicWordSetVector
Speech Speech PartOfDebate PartOfDebate
+
Speaker X =
ActorFromSpeech TimeFrame
20. Finally
SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard
data will be available soon.
Feel free to use it!
Links to media + search/browse app are expected early next year.
21. Thank you for your
attention!
Henri Beunders (EUR) Damir Juric (TU Delft)
Jaap Blom (NISV) Max Kemman (EUR)
Laura Hollink (VU) Martijn Kleppe (EUR)
Geert-Jan Houben (TU Delft) Johan Oomen (NISV)