際際滷

際際滷Share a Scribd company logo
Linked Data for Czech Legislation
Martin Neask箪, Ph.D.
necasky@xrg.cz
Matematicko-fyzik叩ln鱈 fakulta Univerzity Karlovy
http://www.xrg.cz
http://www.opendata.cz
Project Motivation
 There are many documents/entities published by
public bodies which refer to particular legal acts or
their parts.
 People need to find which documents/entities refer
to what acts or their parts.
Acts
Court decisions Inspection results
AgendaPermissions
Project Motivation
 Legal acts define concepts and relationships between
them.
 People need to find relationships of a given concept
with other concepts. They also need to refer to that
concept from their documents/entities.
Accounting
entity
hasDefinition
hasObligation
Accounting Act
Data processing workflow
Project Objectives
1. Find a common data model (language) which
enables to
 represent all this data
 publish the data on the web in a standard way so
that it can be linked from other data sources on
the web
2. Get consolidated expressions of Czech acts
 We can buy them or reconstruct them on our
own.
 We reconstructed them! (great thanks to Charles
University student Karel Kl鱈ma)
Project Objectives
3. Use machine-learning methods for recognizing
references to acts which appear in documents.
 Currently, we have recognition in court decisions (by
our Ph.D. student Vincent Kr鱈転)
4. Use NLP methods to extract concepts and
relationships between them from consolidated
expressions of Czech acts, with the following
constraints
 Only from in a specified domain
 Initial list of important concepts constructed
manually as an input
UFAL + KSI (+ students) cooperation
Gathering data (code of law, court decisions, .)
Consolidated acts
Extraction of act
references in text
Extraction of
concepts and
relationships
Representation in a common data model
Linking with other data sources
Application development Application development
Common Data Model  Linked Data
Common Data Model  Linked Data
 RDF + Linked Data principles
1. Use URLs to identify your things.
2. When someone looks up your URL of an entity,
provide useful data about the entity.
3. Use RDF as a data format, enable querying with
SPARQL.
4. Provide links to other related things as part of
the provided data, also in RDF.
Common Data Model - URLs
 Act no. 235/2004 (Value Added Tax Act)
http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004
 When a client requests this URL (via HTTP
protocol), data about Act no. 235/2004 is
provided in RDF
 There are various serialization formats of RDF data
model; provided serialization format depends on
the request (content negotiation is applied)
Common Data Model - SPARQL
 All sections of Act no. 235/2004
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
SELECT DISTINCT ?section
WHERE {
?section frbr:partOf+
<http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> ;
a frbr:Work .
}
ORDER BY ?section
Common Data Model - SPARQL
 The number of consolidated versions of
particular sections of Act no. 235/2004?
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?section (COUNT(DISTINCT(?text)) AS ?cnt)
WHERE {
?expression frbr:realizationOf ?section ;
dcterms:description ?text .
?section frbr:partOf+
<http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004>
}
GROUP BY ?section
ORDER BY DESC(?cnt)
Common Data Model - SPARQL
 Are there any court decisions citing Act no.
235/2004 or any of its sections?
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX sao: <http://salt.semanticauthoring.org/ontologies/sao#>
PREFIX sdo: <http://salt.semanticauthoring.org/ontologies/sdo#>
SELECT DISTINCT ?decision ?decisionTitle ?sectionOrAct
WHERE {
?annotation sao:hasTopic ?sectionOrAct .
?sectionOrAct frbr:partOf*
<http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> .
?decisionExpr
sdo:hasSection/sdo:hasParagraph/sdo:hasTextChunk/sdo:hasAnnotation ?annotation ;
frbr:realizationOf ?decision .
?decision dcterms:title ?decisionTitle .
}
ORDER BY ?sectionOrAct
Common Data Model - SPARQL
 What kinds of entities/documents are linked to
Act no. 235/2004?
SELECT DISTINCT ?p ?t
WHERE {
?s ?p <http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> ;
a ?t .
}
Linked Data Representation of Extracted
Concepts and Relationships
Judik叩ty
Representation of Concepts and
Relationships
K n叩vrhu je navrhovatel povinen pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩 .
K n叩vrhu je navrhovatel povinen pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩 .
subject predicate object
navrhovatel povinen pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩
Navrhovatel
(dle z叩k. NN/YYYY)
lingv:TextChunk lingv:TextChunk lingv:TextChunk
lexc:Concept
lingv:subject lingv:object
Pipojit listiny, kter箪ch se
v n叩vrhu dovol叩v叩
(dle z叩k. NN/YYYY)
lexc:Concept
lexc:hasObligation
lexc:hasDefinition
extracted definition
text
lexc:hasObligation

(dle z叩k.
NN/YYYY)
lexc:Concept
則 C
Z叩kon . NN/YYYY
frbr:partOf
frbr:partOf
Judik叩ty
Legal Concepts Ontology
 Each extracted concept is represented as an
instance of class lexc:Concept.
lexc:Concept
lexc:ConceptVersionfrbr:Expression
lex:Act
frbr:partOf
frbr:partOf
lexc:hasObligation,
lexc:hasRight
rdfs:Literal
lexc:hasDefinition
Concept Zamstnavatel
 http://linked.opendata.cz/resource/legislation
/cz/expression/2006/262-2006/version/cz/2006-
04-21/concept/ucetni-pojem/zamstnavatel
(
see
http://internal.opendata.cz:8890/describe/?url=http://linked.opendata.cz/resource/le
gislation/cz/expression/2006/262-2006/version/cz/2006-04-21/concept/ucetni-
pojem/zam%C4%9Bstnavatel
)
Concept Zamstnavatel
Obligations of Zamstnavatel
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX lingv: <http://purl.org/lingv/ontology#>
PREFIX lexc: <http://purl.org/lex/ontology/concepts#>
SELECT ?obligationChunk ?obligationLabel
WHERE {
<http://linked.opendata.cz/resource/legislation/cz/expression/2006/262
-2006/version/cz/2006-04-21/concept/ucetni-pojem/zamstnavatel>
lexc:hasObligation ?obligation .
?obligation ^oa:hasBody/oa:hasTarget ?obligationChunk .
?obligationChunk lingv:hasForm/skos:prefLabel ?obligationLabel .
}
Obligations of Zamstnavatel
Linguistic Ontology
lexc:ConceptVersion
oa:hasBody
oa:Annotation
oa:hasTarget
lingv:TextChunklingv:Form
lingv:hasForm
lingv:Form
lingv:hasLemma
lingv:DependencyTree
lingv:hasTree
Next steps
 Improve NLP extraction (see next part of the
presentation)  queries
 Better linking of concepts to
 particular sections of acts
 to other data sources (e.g., life situations, agendas of
public bodies, fines imposed by public bodies, etc.)
 Develop web applications which
 enable users to work with the extracted concepts and
relationships
 enable to explore links between extracted concepts
and other data sources
Vincent Kr鱈転, Barbora Hladk叩
RExtractor
Entity Relation Extraction
from Unstructured Texts
Intelligent library (INTLIB, TA02010182)
Seminar of formal linguistics, 2014-05-12
Institute of Formal and Applied Linguistics
Faculty of Mathematics and Physics
Charles University in Prague
Czech Republic
{kriz,hladka}@ufal.mff.cuni.cz
http://ufal.mff.cuni.cz/intlib
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Motivation
Typical search approaches
 full-text search
 metadata search
Our approach
 building a knowledge base
 semantic representation of documents
 entities and their relations
 represented in the Resource Description
Framework (RDF)
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Data processing workflow
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
RExtractor Architecture

Domain independent
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Conversion Component

converts various input formats into unified
representation (XML)
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component

Prague Dependency
Treebank framework

Tools
 segmentation &
tokenization
 lemmatization &
morphology
 syntactic parsing
 deep syntactic parsing
 Treex

http://ufal.mff.cuni.cz/pdt3.0

http://ufal.mff.cuni.cz/treex
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Entity Detection Component

Database of Entities
 entities specified by domain experts

PML-TQ

http://ufal.mff.cuni.cz/tools/pml-tq
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component

Database of Queries
 queries formulated by
domain experts
 their formulation in
the form of PML-TQ
queries on
dependency trees

RDF ready output:
Subject Predicate Object
Entity hasToCreate Something
Accounting
units
create fixed items
Accounting
units
create reserves
Subject Predicate Object
Entity hasToCreate Something
Accounting
units
create fixed items
Accounting
units
create reserves

Example of user query:
accounting units' obligations
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Case study on legislative domain
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Case study on legislative domain
Legal texts
 specialized texts operating in legal settings
 they should transmit legal norms to their recipients
 they need to be clear, explicit and precise
Sentences
 simple sentences are very rare
 usually long and very complex
Legal texts are generally considered very
difficult to read and understand.(Tiersma, 2010)
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
RExtractor Architecture
Adaptation for legislative domain
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Conversion component
HLAVA I
VODN USTANOVEN
則 1
Pedmt 炭pravy
Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿 pedpisy Evropsk辿 unie a upravuje:
a) zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲, vymezen鱈 炭tvar哲 podzemn鱈ch vod,
b) zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a
c) n叩le転itosti program哲 zji邸泥ov叩n鱈 a hodnocen鱈 stavu podzemn鱈ch vod.
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Conversion component
HLAVA I
VODN USTANOVEN
則 1
Pedmt 炭pravy
Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿 pedpisy Evropsk辿 unie a upravuje:
a) zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲, vymezen鱈 炭tvar哲 podzemn鱈ch vod,
b) zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a
c) n叩le転itosti program哲 zji邸泥ov叩n鱈 a hodnocen鱈 stavu podzemn鱈ch vod.
<head id="11" label="HLAVA I">
<title>VODN USTANOVEN</title>
<section id="12" label="則 1">
<title>Pedmt 炭pravy</title>
<text>Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿
pedpisy Evropsk辿 unie a upravuje:</text>
<section id="13" label="a)">
<text>zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲,
vymezen鱈 炭tvar哲 podzemn鱈ch vod,</text>
</section>
<section id="14" label="b)">
<text>zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a</text>
</section>
<section id="15" label="c)">
<text>n叩le転itosti program哲 zji邸泥ov叩n鱈 a
hodnocen鱈 stavu podzemn鱈ch vod.</text>
</section>
</section>
</head>
<head id="11" label="HLAVA I">
<title>VODN USTANOVEN</title>
<section id="12" label="則 1">
<title>Pedmt 炭pravy</title>
<text>Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿
pedpisy Evropsk辿 unie a upravuje:</text>
<section id="13" label="a)">
<text>zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲,
vymezen鱈 炭tvar哲 podzemn鱈ch vod,</text>
</section>
<section id="14" label="b)">
<text>zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a</text>
</section>
<section id="15" label="c)">
<text>n叩le転itosti program哲 zji邸泥ov叩n鱈 a
hodnocen鱈 stavu podzemn鱈ch vod.</text>
</section>
</section>
</head>
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component
Corpus of Czech legal texts (CCLT)
 The Accounting Act (563/1991 Coll.)
 Decree on Double-entry Accounting for
undertakers (500/2002 Coll.)
 automatically parsed, then manually checked

1,133 manually annotated a-trees

35,085 tokens

Credit to Zdeka Ure邸ov叩
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component
Corpus of Czech legal texts (CCLT)
 enumerations and lists as one tree
 manual annotation guidelines

split sentence according to formal markers

use links for dependencies between partial trees
 automatic procedure merges partial annotations
into a final tree
Pipeline visualization available on-line at
ufal.mff.cuni.cz/intlib
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component
Automatic parsers for Czech
 trained on newspaper texts
 verification whether we can use the parser
trained on newspaper texts or some
modifications are needed
 MST parser Ryan McDonald, Fernando Pereira,
Kiril Ribarov, Jan Haji (2005): Non-projective
Dependency Parsing using Spanning Tree
Algorithms. In: Proceedings of HLT/EMNLP,
Vancouver, British Columbia.
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component
Sentence splitting
 We substitute long lists and enumerations by
several shorter sentences
Original sentence New sentences
(2) Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈
a) st叩tn鱈 rozpoet
b) rozpoet st叩tn鱈ho fondu,
c) rozpoet Evropsk辿 unie, nebo
d) rozpoet, o nm転 to stanov鱈 z叩kon.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 st叩tn鱈 rozpoet.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 rozpoet st叩tn鱈ho fondu.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 rozpoet Evropsk辿 unie.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 rozpoet, o nm転 to stanov鱈
z叩kon.
Original sentence New sentences
(2) Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈
a) st叩tn鱈 rozpoet
b) rozpoet st叩tn鱈ho fondu,
c) rozpoet Evropsk辿 unie, nebo
d) rozpoet, o nm転 to stanov鱈 z叩kon.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 st叩tn鱈 rozpoet.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 rozpoet st叩tn鱈ho fondu.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 rozpoet Evropsk辿 unie.
Veejn箪m rozpotem se pro 炭ely tohoto
z叩kona rozum鱈 rozpoet, o nm転 to stanov鱈
z叩kon.
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component
Re-tokenization
etn鱈 jednotky tvo鱈 opravn辿 polo転ky podle ustanoven鱈 則
16, 26, 31, 55 a 57
a neoceuj鱈 majetek podle 則 27, 則 14, 39, 則 51 a転 55, 則 58,
60 a 69
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
NLP Component
Re-tokenization
etn鱈 jednotky tvo鱈 opravn辿 polo転ky podle ustanoven鱈 則 16, 26, 31, 55 a 57
a neoceuj鱈 majetek podle 則 27, 則 14, 39, 則 51 a転 55, 則 58, 60 a 69
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Entity Detection Component
Entities in CCLT
 Accounting subdomain
 Entities manually annotated by Sysnet, Ltd.

Decree on Double-entry Accounting for
undertakers (500/2002 Coll.)
Sample
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Entity Detection Component
Initializing DBE with entities from CCLT
 Each (unique) entity parsed automatically by MST
 Automatic procedure takes an entity dependency
tree and creates a PML-TQ query
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Entity Detection Component
Experiment
 identify entities in gold standard trees in CCLT

with re-tokenized tokens and (very) long sentences
 identify entities in trees created by MST

with re-tokenized tokens and split sentences
Results
 high False positives
 automatic parser has low influence on detection
Parsing method Extracted TP FP FN Precision Recall
Manual 16428 9549 6879 628 58.1 93.8
Automatic 16160 9278 6882 838 57.4 91.7
Parsing method Extracted TP FP FN Precision Recall
Manual 16428 9549 6879 628 58.1 93.8
Automatic 16160 9278 6882 838 57.4 91.7
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Types of relations
 Definitions (D)  entities are defined or explained

N叩hradn鱈m ubytov叩n鱈m se rozum鱈 byt o jedn辿 m鱈stnosti nebo pokoj
ve svobod叩rn nebo podn叩jem v za鱈zen辿 nebo neza鱈zen辿 叩sti
bytu jin辿ho n叩jemce.
 Obligations (O)  entity is obligated to do
something

K n叩vrhu je navrhovatel povinen pipojit listiny , kter箪ch se v n叩vrhu
dovol叩v叩.
 Rights (R)  entity has right to do something

Nabyvatel m哲転e uplatovat n叩rok z odpovdnosti za vady u soudu
jen tehdy , vytkl-li vady bez zbyten辿ho odkladu po t辿 , kdy ml
mo転nost vc prohl辿dnout .
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Manual design of queries

Strategy: cover maximum of relations with
minimum of queries

tree query expert
 observes typical constructions for given type of
relation
 designs query for the most frequent construction
 goes through matches and redesign query if
needed
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Query design & evaluation on CCLT

Query design
 on The Accounting Act (563/1991 Coll.)
 5 queries for Definitions
 4 queries for Rights
 2 queries for Obligation

Evaluation
 on Decree on Double-entry Accounting for
undertakers (500/2002 Coll.)
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Results
D R O Total
# of queries 5 4 2 11
Goldstandard 97 308 62 467
Extracted 70 255 41 366
True positive 53 206 36 295
False negative 44 102 26 172
False positive 17 49 5 71
Precision (%) 75.7 80.8 87.8 80.6
Recall (%) 54.6 66.9 58.1 63.2
D R O Total
# of queries 5 4 2 11
Goldstandard 97 308 62 467
Extracted 70 255 41 366
True positive 53 206 36 295
False negative 44 102 26 172
False positive 17 49 5 71
Precision (%) 75.7 80.8 87.8 80.6
Recall (%) 54.6 66.9 58.1 63.2
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Error analysis
Results
 errors in automatic parsing
 query design
Error # of errors Ratio
Parser 145 59.7%
Query 93 38.3%
Entity 5 2.1%
Error # of errors Ratio
Parser 145 59.7%
Query 93 38.3%
Entity 5 2.1%
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Experiment with more data
 28 laws from accounting subdomain
 27,808 sentences
 745,137 tokens
D R O
D1
36 R1
240 O1
183
D2
287 R2
470 O2
37
D3
35 R3
127
D4
466 R4
6
D5
46
Total 1580 Total 843 Total 220
D R O
D1
36 R1
240 O1
183
D2
287 R2
470 O2
37
D3
35 R3
127
D4
466 R4
6
D5
46
Total 1580 Total 843 Total 220
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Query example - Definition
 N叩hradn鱈m ubytov叩n鱈m se rozum鱈 byt o jedn辿 m鱈stnosti nebo pokoj ve
svobod叩rn nebo podn叩jem v za鱈zen辿 nebo neza鱈zen辿 叩sti bytu
jin辿ho n叩jemce .
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Query example  Obligation
 K n叩vrhu je navrhovatel povinen pipojit listiny , kter箪ch se v n叩vrhu
dovol叩v叩 .
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Relation Extraction Component
Query example  Right
 Nabyvatel m哲転e uplatovat n叩rok z odpovdnosti za vady u soudu jen
tehdy , vytkl-li vady bez zbyten辿ho odkladu po t辿 , kdy ml mo転nost vc
prohl辿dnout .
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Future Work
Legislative domain
 Parsing

evaluation and adaptation
 Entity detection

automatic entity detection based on
sample of entities annotated manually
 Relation extraction

automatic query design
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Case study on environmental domain
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Case study on environmental domain

What are the environmental consequences of
a project?

Environmental Impact Assessment considers
the environmental impacts whether or not to
proceed with a project.

In the Czech Republic, CENIA administers the
information system EIA.
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
EIA system
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Example

Amazon's plan to build a distribution center in Brno,
CR (no, no, no, yes by Brno councilors)

May 9, 2014: a new intention posted at EIA by CTP
Invest
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Mining EIA documentation

Sysnet, Ltd. specified what entities and
relations to extract, e.g.

Title (Section B.I.1)

Category, type (Section B.I.1)

Capacity, size (Section B.I.2, B.I.6)

Location (Section B.I.3)

Scheduling (Section B.I.7)

...
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Focus on section B.I.2

Example
Vlastn鱈 are叩l bude sest叩vat z halov辿ho objektu
o plo邸e cca 96 000 m2
, kter箪 bude uvnit
rozdlen na 3 haly  Pedpokl叩daj鱈 se 2 kryt叩
st叩n鱈 pro j鱈zdn鱈 kola a 1150 parkovac鱈ch st叩n鱈
pro osobn鱈 vozidla  Sou叩st鱈 z叩mru je
realizace sadov箪ch 炭prav, kter叩 zahrnuje
v箪sadbu v鱈ce ne転 250 ks vzrostl箪ch strom哲
 The park will consists of the hall with the area of cca 96 000
m2
that will be split into 3 halls  There will be 2 roofed
bicycle parking stations and 1,150 parking slots ...
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
Using RExtractor

queries by regular expressions
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12
D叩le je provozov叩na produkn鱈 st叩j VKK pro 336
ks dojnic (403,2 DJ). (In addition, a reproductive barn
VKK is used for 336 cows.)
(Adj Nom)? (Noun Nom) (number) (unit) (Noun Gen)
( attribute )( entity ) (number) (unit) ( entity )
( reproductive )( barn ) (336) (pcs) ( cow )
Regular expressions
Credit to Ivana Luk邸ov叩
Kr鱈転, Hladk叩: RExtractor  Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12

Evaluation
 Developers vs. users
 Gold standard data vs. practical use cases
 Experience vs. expectation
 Scientific contribution vs. making life easier
Both l. & e. domain

More Related Content

Linked Data for Czech Legislation - 2nd year of our project

  • 1. Linked Data for Czech Legislation Martin Neask箪, Ph.D. necasky@xrg.cz Matematicko-fyzik叩ln鱈 fakulta Univerzity Karlovy http://www.xrg.cz http://www.opendata.cz
  • 2. Project Motivation There are many documents/entities published by public bodies which refer to particular legal acts or their parts. People need to find which documents/entities refer to what acts or their parts. Acts Court decisions Inspection results AgendaPermissions
  • 3. Project Motivation Legal acts define concepts and relationships between them. People need to find relationships of a given concept with other concepts. They also need to refer to that concept from their documents/entities. Accounting entity hasDefinition hasObligation Accounting Act
  • 5. Project Objectives 1. Find a common data model (language) which enables to represent all this data publish the data on the web in a standard way so that it can be linked from other data sources on the web 2. Get consolidated expressions of Czech acts We can buy them or reconstruct them on our own. We reconstructed them! (great thanks to Charles University student Karel Kl鱈ma)
  • 6. Project Objectives 3. Use machine-learning methods for recognizing references to acts which appear in documents. Currently, we have recognition in court decisions (by our Ph.D. student Vincent Kr鱈転) 4. Use NLP methods to extract concepts and relationships between them from consolidated expressions of Czech acts, with the following constraints Only from in a specified domain Initial list of important concepts constructed manually as an input
  • 7. UFAL + KSI (+ students) cooperation Gathering data (code of law, court decisions, .) Consolidated acts Extraction of act references in text Extraction of concepts and relationships Representation in a common data model Linking with other data sources Application development Application development
  • 8. Common Data Model Linked Data
  • 9. Common Data Model Linked Data RDF + Linked Data principles 1. Use URLs to identify your things. 2. When someone looks up your URL of an entity, provide useful data about the entity. 3. Use RDF as a data format, enable querying with SPARQL. 4. Provide links to other related things as part of the provided data, also in RDF.
  • 10. Common Data Model - URLs Act no. 235/2004 (Value Added Tax Act) http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004 When a client requests this URL (via HTTP protocol), data about Act no. 235/2004 is provided in RDF There are various serialization formats of RDF data model; provided serialization format depends on the request (content negotiation is applied)
  • 11. Common Data Model - SPARQL All sections of Act no. 235/2004 PREFIX frbr: <http://purl.org/vocab/frbr/core#> SELECT DISTINCT ?section WHERE { ?section frbr:partOf+ <http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> ; a frbr:Work . } ORDER BY ?section
  • 12. Common Data Model - SPARQL The number of consolidated versions of particular sections of Act no. 235/2004? PREFIX frbr: <http://purl.org/vocab/frbr/core#> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT ?section (COUNT(DISTINCT(?text)) AS ?cnt) WHERE { ?expression frbr:realizationOf ?section ; dcterms:description ?text . ?section frbr:partOf+ <http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> } GROUP BY ?section ORDER BY DESC(?cnt)
  • 13. Common Data Model - SPARQL Are there any court decisions citing Act no. 235/2004 or any of its sections? PREFIX frbr: <http://purl.org/vocab/frbr/core#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX sao: <http://salt.semanticauthoring.org/ontologies/sao#> PREFIX sdo: <http://salt.semanticauthoring.org/ontologies/sdo#> SELECT DISTINCT ?decision ?decisionTitle ?sectionOrAct WHERE { ?annotation sao:hasTopic ?sectionOrAct . ?sectionOrAct frbr:partOf* <http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> . ?decisionExpr sdo:hasSection/sdo:hasParagraph/sdo:hasTextChunk/sdo:hasAnnotation ?annotation ; frbr:realizationOf ?decision . ?decision dcterms:title ?decisionTitle . } ORDER BY ?sectionOrAct
  • 14. Common Data Model - SPARQL What kinds of entities/documents are linked to Act no. 235/2004? SELECT DISTINCT ?p ?t WHERE { ?s ?p <http://linked.opendata.cz/resource/legislation/cz/act/2004/235-2004> ; a ?t . }
  • 15. Linked Data Representation of Extracted Concepts and Relationships
  • 16. Judik叩ty Representation of Concepts and Relationships K n叩vrhu je navrhovatel povinen pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩 . K n叩vrhu je navrhovatel povinen pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩 . subject predicate object navrhovatel povinen pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩 Navrhovatel (dle z叩k. NN/YYYY) lingv:TextChunk lingv:TextChunk lingv:TextChunk lexc:Concept lingv:subject lingv:object Pipojit listiny, kter箪ch se v n叩vrhu dovol叩v叩 (dle z叩k. NN/YYYY) lexc:Concept lexc:hasObligation lexc:hasDefinition extracted definition text lexc:hasObligation (dle z叩k. NN/YYYY) lexc:Concept 則 C Z叩kon . NN/YYYY frbr:partOf frbr:partOf Judik叩ty
  • 17. Legal Concepts Ontology Each extracted concept is represented as an instance of class lexc:Concept. lexc:Concept lexc:ConceptVersionfrbr:Expression lex:Act frbr:partOf frbr:partOf lexc:hasObligation, lexc:hasRight rdfs:Literal lexc:hasDefinition
  • 20. Obligations of Zamstnavatel PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX frbr: <http://purl.org/vocab/frbr/core#> PREFIX oa: <http://www.w3.org/ns/oa#> PREFIX lingv: <http://purl.org/lingv/ontology#> PREFIX lexc: <http://purl.org/lex/ontology/concepts#> SELECT ?obligationChunk ?obligationLabel WHERE { <http://linked.opendata.cz/resource/legislation/cz/expression/2006/262 -2006/version/cz/2006-04-21/concept/ucetni-pojem/zamstnavatel> lexc:hasObligation ?obligation . ?obligation ^oa:hasBody/oa:hasTarget ?obligationChunk . ?obligationChunk lingv:hasForm/skos:prefLabel ?obligationLabel . }
  • 23. Next steps Improve NLP extraction (see next part of the presentation) queries Better linking of concepts to particular sections of acts to other data sources (e.g., life situations, agendas of public bodies, fines imposed by public bodies, etc.) Develop web applications which enable users to work with the extracted concepts and relationships enable to explore links between extracted concepts and other data sources
  • 24. Vincent Kr鱈転, Barbora Hladk叩 RExtractor Entity Relation Extraction from Unstructured Texts Intelligent library (INTLIB, TA02010182) Seminar of formal linguistics, 2014-05-12 Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University in Prague Czech Republic {kriz,hladka}@ufal.mff.cuni.cz http://ufal.mff.cuni.cz/intlib
  • 25. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Motivation Typical search approaches full-text search metadata search Our approach building a knowledge base semantic representation of documents entities and their relations represented in the Resource Description Framework (RDF)
  • 26. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Data processing workflow
  • 27. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 RExtractor Architecture Domain independent
  • 28. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Conversion Component converts various input formats into unified representation (XML)
  • 29. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Prague Dependency Treebank framework Tools segmentation & tokenization lemmatization & morphology syntactic parsing deep syntactic parsing Treex http://ufal.mff.cuni.cz/pdt3.0 http://ufal.mff.cuni.cz/treex
  • 30. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Entity Detection Component Database of Entities entities specified by domain experts PML-TQ http://ufal.mff.cuni.cz/tools/pml-tq
  • 31. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Database of Queries queries formulated by domain experts their formulation in the form of PML-TQ queries on dependency trees RDF ready output: Subject Predicate Object Entity hasToCreate Something Accounting units create fixed items Accounting units create reserves Subject Predicate Object Entity hasToCreate Something Accounting units create fixed items Accounting units create reserves Example of user query: accounting units' obligations
  • 32. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Case study on legislative domain
  • 33. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Case study on legislative domain Legal texts specialized texts operating in legal settings they should transmit legal norms to their recipients they need to be clear, explicit and precise Sentences simple sentences are very rare usually long and very complex Legal texts are generally considered very difficult to read and understand.(Tiersma, 2010)
  • 34. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 RExtractor Architecture Adaptation for legislative domain
  • 35. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Conversion component HLAVA I VODN USTANOVEN 則 1 Pedmt 炭pravy Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿 pedpisy Evropsk辿 unie a upravuje: a) zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲, vymezen鱈 炭tvar哲 podzemn鱈ch vod, b) zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a c) n叩le転itosti program哲 zji邸泥ov叩n鱈 a hodnocen鱈 stavu podzemn鱈ch vod.
  • 36. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Conversion component HLAVA I VODN USTANOVEN 則 1 Pedmt 炭pravy Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿 pedpisy Evropsk辿 unie a upravuje: a) zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲, vymezen鱈 炭tvar哲 podzemn鱈ch vod, b) zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a c) n叩le転itosti program哲 zji邸泥ov叩n鱈 a hodnocen鱈 stavu podzemn鱈ch vod. <head id="11" label="HLAVA I"> <title>VODN USTANOVEN</title> <section id="12" label="則 1"> <title>Pedmt 炭pravy</title> <text>Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿 pedpisy Evropsk辿 unie a upravuje:</text> <section id="13" label="a)"> <text>zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲, vymezen鱈 炭tvar哲 podzemn鱈ch vod,</text> </section> <section id="14" label="b)"> <text>zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a</text> </section> <section id="15" label="c)"> <text>n叩le転itosti program哲 zji邸泥ov叩n鱈 a hodnocen鱈 stavu podzemn鱈ch vod.</text> </section> </section> </head> <head id="11" label="HLAVA I"> <title>VODN USTANOVEN</title> <section id="12" label="則 1"> <title>Pedmt 炭pravy</title> <text>Tato vyhl叩邸ka zapracov叩v叩 p鱈slu邸n辿 pedpisy Evropsk辿 unie a upravuje:</text> <section id="13" label="a)"> <text>zp哲sob vymezen鱈 hydrogeologick箪ch rajon哲, vymezen鱈 炭tvar哲 podzemn鱈ch vod,</text> </section> <section id="14" label="b)"> <text>zp哲sob hodnocen鱈 stavu podzemn鱈ch vod a</text> </section> <section id="15" label="c)"> <text>n叩le転itosti program哲 zji邸泥ov叩n鱈 a hodnocen鱈 stavu podzemn鱈ch vod.</text> </section> </section> </head>
  • 37. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Corpus of Czech legal texts (CCLT) The Accounting Act (563/1991 Coll.) Decree on Double-entry Accounting for undertakers (500/2002 Coll.) automatically parsed, then manually checked 1,133 manually annotated a-trees 35,085 tokens Credit to Zdeka Ure邸ov叩
  • 38. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Corpus of Czech legal texts (CCLT) enumerations and lists as one tree manual annotation guidelines split sentence according to formal markers use links for dependencies between partial trees automatic procedure merges partial annotations into a final tree Pipeline visualization available on-line at ufal.mff.cuni.cz/intlib
  • 39. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Automatic parsers for Czech trained on newspaper texts verification whether we can use the parser trained on newspaper texts or some modifications are needed MST parser Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Haji (2005): Non-projective Dependency Parsing using Spanning Tree Algorithms. In: Proceedings of HLT/EMNLP, Vancouver, British Columbia.
  • 40. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Sentence splitting We substitute long lists and enumerations by several shorter sentences Original sentence New sentences (2) Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 a) st叩tn鱈 rozpoet b) rozpoet st叩tn鱈ho fondu, c) rozpoet Evropsk辿 unie, nebo d) rozpoet, o nm転 to stanov鱈 z叩kon. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 st叩tn鱈 rozpoet. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 rozpoet st叩tn鱈ho fondu. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 rozpoet Evropsk辿 unie. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 rozpoet, o nm転 to stanov鱈 z叩kon. Original sentence New sentences (2) Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 a) st叩tn鱈 rozpoet b) rozpoet st叩tn鱈ho fondu, c) rozpoet Evropsk辿 unie, nebo d) rozpoet, o nm転 to stanov鱈 z叩kon. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 st叩tn鱈 rozpoet. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 rozpoet st叩tn鱈ho fondu. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 rozpoet Evropsk辿 unie. Veejn箪m rozpotem se pro 炭ely tohoto z叩kona rozum鱈 rozpoet, o nm転 to stanov鱈 z叩kon.
  • 41. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Re-tokenization etn鱈 jednotky tvo鱈 opravn辿 polo転ky podle ustanoven鱈 則 16, 26, 31, 55 a 57 a neoceuj鱈 majetek podle 則 27, 則 14, 39, 則 51 a転 55, 則 58, 60 a 69
  • 42. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 NLP Component Re-tokenization etn鱈 jednotky tvo鱈 opravn辿 polo転ky podle ustanoven鱈 則 16, 26, 31, 55 a 57 a neoceuj鱈 majetek podle 則 27, 則 14, 39, 則 51 a転 55, 則 58, 60 a 69
  • 43. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Entity Detection Component Entities in CCLT Accounting subdomain Entities manually annotated by Sysnet, Ltd. Decree on Double-entry Accounting for undertakers (500/2002 Coll.) Sample
  • 44. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Entity Detection Component Initializing DBE with entities from CCLT Each (unique) entity parsed automatically by MST Automatic procedure takes an entity dependency tree and creates a PML-TQ query
  • 45. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Entity Detection Component Experiment identify entities in gold standard trees in CCLT with re-tokenized tokens and (very) long sentences identify entities in trees created by MST with re-tokenized tokens and split sentences Results high False positives automatic parser has low influence on detection Parsing method Extracted TP FP FN Precision Recall Manual 16428 9549 6879 628 58.1 93.8 Automatic 16160 9278 6882 838 57.4 91.7 Parsing method Extracted TP FP FN Precision Recall Manual 16428 9549 6879 628 58.1 93.8 Automatic 16160 9278 6882 838 57.4 91.7
  • 46. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Types of relations Definitions (D) entities are defined or explained N叩hradn鱈m ubytov叩n鱈m se rozum鱈 byt o jedn辿 m鱈stnosti nebo pokoj ve svobod叩rn nebo podn叩jem v za鱈zen辿 nebo neza鱈zen辿 叩sti bytu jin辿ho n叩jemce. Obligations (O) entity is obligated to do something K n叩vrhu je navrhovatel povinen pipojit listiny , kter箪ch se v n叩vrhu dovol叩v叩. Rights (R) entity has right to do something Nabyvatel m哲転e uplatovat n叩rok z odpovdnosti za vady u soudu jen tehdy , vytkl-li vady bez zbyten辿ho odkladu po t辿 , kdy ml mo転nost vc prohl辿dnout .
  • 47. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Manual design of queries Strategy: cover maximum of relations with minimum of queries tree query expert observes typical constructions for given type of relation designs query for the most frequent construction goes through matches and redesign query if needed
  • 48. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Query design & evaluation on CCLT Query design on The Accounting Act (563/1991 Coll.) 5 queries for Definitions 4 queries for Rights 2 queries for Obligation Evaluation on Decree on Double-entry Accounting for undertakers (500/2002 Coll.)
  • 49. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Results D R O Total # of queries 5 4 2 11 Goldstandard 97 308 62 467 Extracted 70 255 41 366 True positive 53 206 36 295 False negative 44 102 26 172 False positive 17 49 5 71 Precision (%) 75.7 80.8 87.8 80.6 Recall (%) 54.6 66.9 58.1 63.2 D R O Total # of queries 5 4 2 11 Goldstandard 97 308 62 467 Extracted 70 255 41 366 True positive 53 206 36 295 False negative 44 102 26 172 False positive 17 49 5 71 Precision (%) 75.7 80.8 87.8 80.6 Recall (%) 54.6 66.9 58.1 63.2
  • 50. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Error analysis Results errors in automatic parsing query design Error # of errors Ratio Parser 145 59.7% Query 93 38.3% Entity 5 2.1% Error # of errors Ratio Parser 145 59.7% Query 93 38.3% Entity 5 2.1%
  • 51. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Experiment with more data 28 laws from accounting subdomain 27,808 sentences 745,137 tokens D R O D1 36 R1 240 O1 183 D2 287 R2 470 O2 37 D3 35 R3 127 D4 466 R4 6 D5 46 Total 1580 Total 843 Total 220 D R O D1 36 R1 240 O1 183 D2 287 R2 470 O2 37 D3 35 R3 127 D4 466 R4 6 D5 46 Total 1580 Total 843 Total 220
  • 52. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Query example - Definition N叩hradn鱈m ubytov叩n鱈m se rozum鱈 byt o jedn辿 m鱈stnosti nebo pokoj ve svobod叩rn nebo podn叩jem v za鱈zen辿 nebo neza鱈zen辿 叩sti bytu jin辿ho n叩jemce .
  • 53. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Query example Obligation K n叩vrhu je navrhovatel povinen pipojit listiny , kter箪ch se v n叩vrhu dovol叩v叩 .
  • 54. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Relation Extraction Component Query example Right Nabyvatel m哲転e uplatovat n叩rok z odpovdnosti za vady u soudu jen tehdy , vytkl-li vady bez zbyten辿ho odkladu po t辿 , kdy ml mo転nost vc prohl辿dnout .
  • 55. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Future Work Legislative domain Parsing evaluation and adaptation Entity detection automatic entity detection based on sample of entities annotated manually Relation extraction automatic query design
  • 56. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Case study on environmental domain
  • 57. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Case study on environmental domain What are the environmental consequences of a project? Environmental Impact Assessment considers the environmental impacts whether or not to proceed with a project. In the Czech Republic, CENIA administers the information system EIA.
  • 58. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 EIA system
  • 59. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Example Amazon's plan to build a distribution center in Brno, CR (no, no, no, yes by Brno councilors) May 9, 2014: a new intention posted at EIA by CTP Invest
  • 60. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Mining EIA documentation Sysnet, Ltd. specified what entities and relations to extract, e.g. Title (Section B.I.1) Category, type (Section B.I.1) Capacity, size (Section B.I.2, B.I.6) Location (Section B.I.3) Scheduling (Section B.I.7) ...
  • 61. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Focus on section B.I.2 Example Vlastn鱈 are叩l bude sest叩vat z halov辿ho objektu o plo邸e cca 96 000 m2 , kter箪 bude uvnit rozdlen na 3 haly Pedpokl叩daj鱈 se 2 kryt叩 st叩n鱈 pro j鱈zdn鱈 kola a 1150 parkovac鱈ch st叩n鱈 pro osobn鱈 vozidla Sou叩st鱈 z叩mru je realizace sadov箪ch 炭prav, kter叩 zahrnuje v箪sadbu v鱈ce ne転 250 ks vzrostl箪ch strom哲 The park will consists of the hall with the area of cca 96 000 m2 that will be split into 3 halls There will be 2 roofed bicycle parking stations and 1,150 parking slots ...
  • 62. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Using RExtractor queries by regular expressions
  • 63. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 D叩le je provozov叩na produkn鱈 st叩j VKK pro 336 ks dojnic (403,2 DJ). (In addition, a reproductive barn VKK is used for 336 cows.) (Adj Nom)? (Noun Nom) (number) (unit) (Noun Gen) ( attribute )( entity ) (number) (unit) ( entity ) ( reproductive )( barn ) (336) (pcs) ( cow ) Regular expressions Credit to Ivana Luk邸ov叩
  • 64. Kr鱈転, Hladk叩: RExtractor Entity Relation Extraction from Unstructured Texts SFL, 2014-05-12 Evaluation Developers vs. users Gold standard data vs. practical use cases Experience vs. expectation Scientific contribution vs. making life easier Both l. & e. domain