This document summarizes a study on the scarcity of crossing dependencies in syntactic structures across languages. It presents two major hypotheses for why crossings are scarce: 1) an underlying rule or principle prohibits crossings, or 2) crossings are indirectly limited by dependency length minimization, which constrains dependency lengths. The study evaluates these hypotheses by analyzing dependency structures from 30 languages and finding that accounting for dependency lengths reduces errors in predicting crossings compared to random arrangements, supporting the second hypothesis.
1 of 14
Download to read offline
More Related Content
The scarcity of crossing dependencies: a direct outcome of a specific constraint?
1. Introduction
Crossing theory
Evaluation
The scarcity of crossing dependencies: a
direct outcome of a specic constraint?
C. Gómez-RodrÃguez R. Ferrer-i-Cancho
1 LyS Research Group
Departamento de Computación
Universidade da Coruña
2Complexity Quantitatitve Linguistics Lab
LARCA Research Group
Universitat Politècnica de Catalunya
Grapth-TA 4 (Barcelona), 4 March 2016
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
2. Introduction
Crossing theory
Evaluation
The scarcity of crossing syntactic dependencies
Indeed , the government is taking a calculated risk .
5
4
1
2
1
4
3
2
1
We keep wondering what Mr. Gates wanted to say .
1 1
5
1 1
4
1
2
8
Dependencies tend to not cross when drawn above the sentence
(Hays and Lecerf in the 1960s)
The average C does not exceed 1 for most of the languages (sample
of 30 treebanks) [Gómez-RodrÃguez and Ferrer-i-Cancho, 2016].
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
3. Introduction
Crossing theory
Evaluation
Why? Two major hypotheses
The scarcity of crossings dependencies arises,
Directly from an underlying rule or principle of human
languages that is responsible for this fact (including the
possibility of some cognitive cost associated directly to
crossings). Held by the overwhelming majority of
researchers. Serious problems
[Ferrer-i-Cancho and Gómez-RodrÃguez, 2016]:
Require heavy assumptions that compromise the parsimony of
linguistic theory as a whole.
Involve explanations based on internal constraints of obscure
nature.
Indirectly, from the actual length of dependencies, which are
constrained by a well-known psychological principle:
dependency length minimization
[Gómez-RodrÃguez and Ferrer-i-Cancho, 2016].
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
4. Introduction
Crossing theory
Evaluation
Crossing theory
Q, set of pairs of edges of a graph that can potentially
cross when their vertices are arranged linearly in some
arbitrary order (edges sharing a vertex cannot cross).
|Q|, the cardinality of Q, is the potential number of
crossings.
In a tree, one has
|Q| = n(n − 1 − k
2 )/2, (1)
|Q| = 0 if and only if the tree is a star tree
[Ferrer-i-Cancho, 2013].
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
5. Introduction
Crossing theory
Evaluation
1st predictor of C
The number of edge crossings
C =
(e1,e2)∈Q
C (e1, e2), (2)
C (e1, e2) is an indicator variable (C (e1, e2) = 1 if the edges e1
and e2 cross and C (e1, e2) = 0 otherwise).
Null hypothesis that the vertices are arranged linearly at random
(all possible orderings are equally likely). p(C (e1, e2) = 1) = 1/3
yields E0[C ] = |Q|/3 [Ferrer-i-Cancho, 2013].
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
6. Introduction
Crossing theory
Evaluation
2nd predictor of C
Introducing knowledge about the length of the dependencies
(edges of length 1 or n − 1 are not crossable).
p(C (e1, e2) = 1) is replaced by p(C (e1, e2) = 1|d (e1), d (e2)),
obtaining [Ferrer-i-Cancho, 2014]
E2[C ] =
(e1,e2)∈Q
p(C (e1, e2) = 1|d (e1), d (e2)), (3)
p(C (e1, e2) = 1|d (e1), d (e2)) depends only on n, d (e1) and
d (e2) [Ferrer-i-Cancho, 2014].
E0[C ] is a true expectation while E2[C ] is not!
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
7. Introduction
Crossing theory
Evaluation
Evaluation of the predictors: materials
Corpora in version 2.0 of the HamleDT collection of treebanks
[Zeman et al., 2014, Rosa et al., 2014]: a harmonization of
existing treebanks for 30 dierent languages into two
well-known annotation styles: Prague dependencies
[Haji£ et al., 2006] and Universal Stanford dependencies
[de Marnee et al., 2014].
Preprocessing: nodes corresponding to punctuation tokens or
null elements were removed (non-punctuation nodes that had
a punctuation node as their head were attached as dependents
of their nearest non-punctuation ancestor). Same treatment
for null elements.
A syntactic dependency structure was included in our analyses
if (1) it dened a tree and (2) the tree was not a star tree.
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
8. Introduction
Crossing theory
Evaluation
Evaluation of the predictors: methods
C , number of crossings of the linear arrangement of a graph in
general.
Ctrue, the number of crossings of the syntactic dependencies of
a real sentence.
relative number of crossings, i.e. ¯C = C /|Q| or
¯Ctrue = Ctrue/|Q| [Ferrer-i-Cancho, 2014].
relative error of a predictor [Ferrer-i-Cancho, 2014]
∆x = Ex ¯C − ¯Ctrue = (Ex[C ] − Ctrue)/|Q|. (4)
∆0 will be used as a baseline for ∆2.
∆0 converges to 1/3 for suciently long sentences when Ctrue
is small [Ferrer-i-Cancho, 2014].
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
9. Introduction
Crossing theory
Evaluation
Results
Mixing sentences of dierent lengths:
The average ∆2, the relative error of the predictor E2[C ], is
small: it does not exceed 5%.
The average ∆2 is at least 6 times smaller than the baseline
∆0 ≈ 30%.
Controlling for sentence length (grouping sentences by length):
Average over group averages of ∆2: it does not exceed 4.3%.
The average ∆2 is at least 7 times smaller than the baseline
error, again ∆0 ≈ 30%.
The minimum size of a group is one sentence; the qualitative
results are very similar if the minimum size is set to 2.
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
10. Introduction
Crossing theory
Evaluation
Concluding remarks
principle of dependency length minimization
[Ferrer-i-Cancho, 2015]
↓
dependency lengths
↓
the actual number of crossings in sentences
To explain the low frequency of crossings in world languages, it
many not be necessary to recur to recur to
A ban of crossings by grammar (e.g.,
[Hudson, 2007, Tanaka, 1997]).
A principle of minimization of crossings [Liu, 2008]
A competence-plus [Hurford, 2012] limiting the number of
crossings.
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
11. Introduction
Crossing theory
Evaluation
de Marnee, M.-C., Dozat, T., Silveira, N., Haverinen, K.,
Ginter, F., Nivre, J., and Manning, C. D. (2014).
Universal Stanford dependencies: a cross-linguistic typology.
In Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H.,
Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis,
S., editors, Proceedings of the Ninth International Conference
on Language Resources and Evaluation (LREC'14), Reykjavik,
Iceland. European Language Resources Association (ELRA).
Ferrer-i-Cancho, R. (2013).
Random crossings in dependency trees.
http://arxiv.org/abs/1305.4561.
Ferrer-i-Cancho, R. (2014).
A stronger null hypothesis for crossing dependencies.
Europhysics Letters, 108:58003.
Ferrer-i-Cancho, R. (2015).
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
12. Introduction
Crossing theory
Evaluation
The placement of the head that minimizes online memory. A
complex systems approach.
Language Dynamics and Change, 5:141164.
Ferrer-i-Cancho, R. and Gómez-RodrÃguez, C. (2016).
Liberating language research from dogmas of the 20th century.
Glottometrics, 33:3334.
Gómez-RodrÃguez, C. and Ferrer-i-Cancho, R. (2016).
The scarcity of crossing dependencies: a direct outcome of a
specic constraint?
http://arxiv.org/abs/1601.03210.
Haji£, J., Panevová, J., Haji£ová, E., Panevová, J., Sgall, P.,
Pajas, P., ’t¥pánek, J., Havelka, J., and Mikulová, M. (2006).
Prague Dependency Treebank 2.0.
CDROM CAT: LDC2006T01, ISBN 1-58563-370-4. Linguistic
Data Consortium.
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
13. Introduction
Crossing theory
Evaluation
Hudson, R. A. (2007).
Language Networks: The New Word Grammar.
Oxford University Press.
Hurford, J. R. (2012).
Chapter 3. Syntax in the Light of Evolution, pages 175258.
Oxford University Press, Oxford.
Liu, H. (2008).
Dependency distance as a metric of language comprehension
diculty.
Journal of Cognitive Science, 9:159191.
Rosa, R., Mašek, J., Marecek, D., Popel, M., Zeman, D., and
Žabokrtský, Z. (2014).
HamleDT 2.0: Thirty dependency treebanks stanfordized.
In Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H.,
Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis,
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
14. Introduction
Crossing theory
Evaluation
S., editors, Proceedings of the Ninth International Conference
on Language Resources and Evaluation (LREC'14), Reykjavik,
Iceland. European Language Resources Association (ELRA).
Tanaka, H. (1997).
Invisible movement in sika-nai and the linear crossing
constraint.
Journal of East Asian Linguistics, 6:143188.
Zeman, D., Du²ek, O., Mare£ek, D., Popel, M., Ramasamy, L.,
’t¥pánek, J., šabokrtský, Z., and Haji£, J. (2014).
HamleDT: Harmonized multi-language dependency treebank.
Language Resources and Evaluation, 48(4):601637.
C. Gómez-RodrÃguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a