ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Introduction
Crossing theory
Evaluation
The scarcity of crossing dependencies: a
direct outcome of a specic constraint?
C. Gómez-Rodríguez  R. Ferrer-i-Cancho
1 LyS Research Group
Departamento de Computación
Universidade da Coruña
2Complexity  Quantitatitve Linguistics Lab
LARCA Research Group
Universitat Politècnica de Catalunya
Grapth-TA 4 (Barcelona), 4 March 2016
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
The scarcity of crossing syntactic dependencies
Indeed , the government is taking a calculated risk .
5
4
1
2
1
4
3
2
1
We keep wondering what Mr. Gates wanted to say .
1 1
5
1 1
4
1
2
8
Dependencies tend to not cross when drawn above the sentence
(Hays and Lecerf in the 1960s)
The average C does not exceed 1 for most of the languages (sample
of 30 treebanks) [Gómez-Rodríguez and Ferrer-i-Cancho, 2016].
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Why? Two major hypotheses
The scarcity of crossings dependencies arises,
Directly from an underlying rule or principle of human
languages that is responsible for this fact (including the
possibility of some cognitive cost associated directly to
crossings). Held by the overwhelming majority of
researchers. Serious problems
[Ferrer-i-Cancho and Gómez-Rodríguez, 2016]:
Require heavy assumptions that compromise the parsimony of
linguistic theory as a whole.
Involve explanations based on internal constraints of obscure
nature.
Indirectly, from the actual length of dependencies, which are
constrained by a well-known psychological principle:
dependency length minimization
[Gómez-Rodríguez and Ferrer-i-Cancho, 2016].
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Crossing theory
Q, set of pairs of edges of a graph that can potentially
cross when their vertices are arranged linearly in some
arbitrary order (edges sharing a vertex cannot cross).
|Q|, the cardinality of Q, is the potential number of
crossings.
In a tree, one has
|Q| = n(n − 1 − k
2 )/2, (1)
|Q| = 0 if and only if the tree is a star tree
[Ferrer-i-Cancho, 2013].
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
1st predictor of C
The number of edge crossings
C =
(e1,e2)∈Q
C (e1, e2), (2)
C (e1, e2) is an indicator variable (C (e1, e2) = 1 if the edges e1
and e2 cross and C (e1, e2) = 0 otherwise).
Null hypothesis that the vertices are arranged linearly at random
(all possible orderings are equally likely). p(C (e1, e2) = 1) = 1/3
yields E0[C ] = |Q|/3 [Ferrer-i-Cancho, 2013].
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
2nd predictor of C
Introducing knowledge about the length of the dependencies
(edges of length 1 or n − 1 are not crossable).
p(C (e1, e2) = 1) is replaced by p(C (e1, e2) = 1|d (e1), d (e2)),
obtaining [Ferrer-i-Cancho, 2014]
E2[C ] =
(e1,e2)∈Q
p(C (e1, e2) = 1|d (e1), d (e2)), (3)
p(C (e1, e2) = 1|d (e1), d (e2)) depends only on n, d (e1) and
d (e2) [Ferrer-i-Cancho, 2014].
E0[C ] is a true expectation while E2[C ] is not!
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Evaluation of the predictors: materials
Corpora in version 2.0 of the HamleDT collection of treebanks
[Zeman et al., 2014, Rosa et al., 2014]: a harmonization of
existing treebanks for 30 dierent languages into two
well-known annotation styles: Prague dependencies
[Haji£ et al., 2006] and Universal Stanford dependencies
[de Marnee et al., 2014].
Preprocessing: nodes corresponding to punctuation tokens or
null elements were removed (non-punctuation nodes that had
a punctuation node as their head were attached as dependents
of their nearest non-punctuation ancestor). Same treatment
for null elements.
A syntactic dependency structure was included in our analyses
if (1) it dened a tree and (2) the tree was not a star tree.
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Evaluation of the predictors: methods
C , number of crossings of the linear arrangement of a graph in
general.
Ctrue, the number of crossings of the syntactic dependencies of
a real sentence.
relative number of crossings, i.e. ¯C = C /|Q| or
¯Ctrue = Ctrue/|Q| [Ferrer-i-Cancho, 2014].
relative error of a predictor [Ferrer-i-Cancho, 2014]
∆x = Ex ¯C − ¯Ctrue = (Ex[C ] − Ctrue)/|Q|. (4)
∆0 will be used as a baseline for ∆2.
∆0 converges to 1/3 for suciently long sentences when Ctrue
is small [Ferrer-i-Cancho, 2014].
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Results
Mixing sentences of dierent lengths:
The average ∆2, the relative error of the predictor E2[C ], is
small: it does not exceed 5%.
The average ∆2 is at least 6 times smaller than the baseline
∆0 ≈ 30%.
Controlling for sentence length (grouping sentences by length):
Average over group averages of ∆2: it does not exceed 4.3%.
The average ∆2 is at least 7 times smaller than the baseline
error, again ∆0 ≈ 30%.
The minimum size of a group is one sentence; the qualitative
results are very similar if the minimum size is set to 2.
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Concluding remarks
principle of dependency length minimization
[Ferrer-i-Cancho, 2015]
↓
dependency lengths
↓
the actual number of crossings in sentences
To explain the low frequency of crossings in world languages, it
many not be necessary to recur to recur to
A ban of crossings by grammar (e.g.,
[Hudson, 2007, Tanaka, 1997]).
A principle of minimization of crossings [Liu, 2008]
A competence-plus [Hurford, 2012] limiting the number of
crossings.
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
de Marnee, M.-C., Dozat, T., Silveira, N., Haverinen, K.,
Ginter, F., Nivre, J., and Manning, C. D. (2014).
Universal Stanford dependencies: a cross-linguistic typology.
In Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H.,
Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis,
S., editors, Proceedings of the Ninth International Conference
on Language Resources and Evaluation (LREC'14), Reykjavik,
Iceland. European Language Resources Association (ELRA).
Ferrer-i-Cancho, R. (2013).
Random crossings in dependency trees.
http://arxiv.org/abs/1305.4561.
Ferrer-i-Cancho, R. (2014).
A stronger null hypothesis for crossing dependencies.
Europhysics Letters, 108:58003.
Ferrer-i-Cancho, R. (2015).
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
The placement of the head that minimizes online memory. A
complex systems approach.
Language Dynamics and Change, 5:141164.
Ferrer-i-Cancho, R. and Gómez-Rodríguez, C. (2016).
Liberating language research from dogmas of the 20th century.
Glottometrics, 33:3334.
Gómez-Rodríguez, C. and Ferrer-i-Cancho, R. (2016).
The scarcity of crossing dependencies: a direct outcome of a
specic constraint?
http://arxiv.org/abs/1601.03210.
Haji£, J., Panevová, J., Haji£ová, E., Panevová, J., Sgall, P.,
Pajas, P., ’t¥pánek, J., Havelka, J., and Mikulová, M. (2006).
Prague Dependency Treebank 2.0.
CDROM CAT: LDC2006T01, ISBN 1-58563-370-4. Linguistic
Data Consortium.
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
Hudson, R. A. (2007).
Language Networks: The New Word Grammar.
Oxford University Press.
Hurford, J. R. (2012).
Chapter 3. Syntax in the Light of Evolution, pages 175258.
Oxford University Press, Oxford.
Liu, H. (2008).
Dependency distance as a metric of language comprehension
diculty.
Journal of Cognitive Science, 9:159191.
Rosa, R., Mašek, J., Marecek, D., Popel, M., Zeman, D., and
Žabokrtský, Z. (2014).
HamleDT 2.0: Thirty dependency treebanks stanfordized.
In Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H.,
Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis,
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
Introduction
Crossing theory
Evaluation
S., editors, Proceedings of the Ninth International Conference
on Language Resources and Evaluation (LREC'14), Reykjavik,
Iceland. European Language Resources Association (ELRA).
Tanaka, H. (1997).
Invisible movement in sika-nai and the linear crossing
constraint.
Journal of East Asian Linguistics, 6:143188.
Zeman, D., Du²ek, O., Mare£ek, D., Popel, M., Ramasamy, L.,
’t¥pánek, J., šabokrtský, Z., and Haji£, J. (2014).
HamleDT: Harmonized multi-language dependency treebank.
Language Resources and Evaluation, 48(4):601637.
C. Gómez-Rodríguez  R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a

More Related Content

The scarcity of crossing dependencies: a direct outcome of a specific constraint?

  • 1. Introduction Crossing theory Evaluation The scarcity of crossing dependencies: a direct outcome of a specic constraint? C. Gómez-Rodríguez R. Ferrer-i-Cancho 1 LyS Research Group Departamento de Computación Universidade da Coruña 2Complexity Quantitatitve Linguistics Lab LARCA Research Group Universitat Politècnica de Catalunya Grapth-TA 4 (Barcelona), 4 March 2016 C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 2. Introduction Crossing theory Evaluation The scarcity of crossing syntactic dependencies Indeed , the government is taking a calculated risk . 5 4 1 2 1 4 3 2 1 We keep wondering what Mr. Gates wanted to say . 1 1 5 1 1 4 1 2 8 Dependencies tend to not cross when drawn above the sentence (Hays and Lecerf in the 1960s) The average C does not exceed 1 for most of the languages (sample of 30 treebanks) [Gómez-Rodríguez and Ferrer-i-Cancho, 2016]. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 3. Introduction Crossing theory Evaluation Why? Two major hypotheses The scarcity of crossings dependencies arises, Directly from an underlying rule or principle of human languages that is responsible for this fact (including the possibility of some cognitive cost associated directly to crossings). Held by the overwhelming majority of researchers. Serious problems [Ferrer-i-Cancho and Gómez-Rodríguez, 2016]: Require heavy assumptions that compromise the parsimony of linguistic theory as a whole. Involve explanations based on internal constraints of obscure nature. Indirectly, from the actual length of dependencies, which are constrained by a well-known psychological principle: dependency length minimization [Gómez-Rodríguez and Ferrer-i-Cancho, 2016]. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 4. Introduction Crossing theory Evaluation Crossing theory Q, set of pairs of edges of a graph that can potentially cross when their vertices are arranged linearly in some arbitrary order (edges sharing a vertex cannot cross). |Q|, the cardinality of Q, is the potential number of crossings. In a tree, one has |Q| = n(n − 1 − k 2 )/2, (1) |Q| = 0 if and only if the tree is a star tree [Ferrer-i-Cancho, 2013]. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 5. Introduction Crossing theory Evaluation 1st predictor of C The number of edge crossings C = (e1,e2)∈Q C (e1, e2), (2) C (e1, e2) is an indicator variable (C (e1, e2) = 1 if the edges e1 and e2 cross and C (e1, e2) = 0 otherwise). Null hypothesis that the vertices are arranged linearly at random (all possible orderings are equally likely). p(C (e1, e2) = 1) = 1/3 yields E0[C ] = |Q|/3 [Ferrer-i-Cancho, 2013]. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 6. Introduction Crossing theory Evaluation 2nd predictor of C Introducing knowledge about the length of the dependencies (edges of length 1 or n − 1 are not crossable). p(C (e1, e2) = 1) is replaced by p(C (e1, e2) = 1|d (e1), d (e2)), obtaining [Ferrer-i-Cancho, 2014] E2[C ] = (e1,e2)∈Q p(C (e1, e2) = 1|d (e1), d (e2)), (3) p(C (e1, e2) = 1|d (e1), d (e2)) depends only on n, d (e1) and d (e2) [Ferrer-i-Cancho, 2014]. E0[C ] is a true expectation while E2[C ] is not! C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 7. Introduction Crossing theory Evaluation Evaluation of the predictors: materials Corpora in version 2.0 of the HamleDT collection of treebanks [Zeman et al., 2014, Rosa et al., 2014]: a harmonization of existing treebanks for 30 dierent languages into two well-known annotation styles: Prague dependencies [Haji£ et al., 2006] and Universal Stanford dependencies [de Marnee et al., 2014]. Preprocessing: nodes corresponding to punctuation tokens or null elements were removed (non-punctuation nodes that had a punctuation node as their head were attached as dependents of their nearest non-punctuation ancestor). Same treatment for null elements. A syntactic dependency structure was included in our analyses if (1) it dened a tree and (2) the tree was not a star tree. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 8. Introduction Crossing theory Evaluation Evaluation of the predictors: methods C , number of crossings of the linear arrangement of a graph in general. Ctrue, the number of crossings of the syntactic dependencies of a real sentence. relative number of crossings, i.e. ¯C = C /|Q| or ¯Ctrue = Ctrue/|Q| [Ferrer-i-Cancho, 2014]. relative error of a predictor [Ferrer-i-Cancho, 2014] ∆x = Ex ¯C − ¯Ctrue = (Ex[C ] − Ctrue)/|Q|. (4) ∆0 will be used as a baseline for ∆2. ∆0 converges to 1/3 for suciently long sentences when Ctrue is small [Ferrer-i-Cancho, 2014]. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 9. Introduction Crossing theory Evaluation Results Mixing sentences of dierent lengths: The average ∆2, the relative error of the predictor E2[C ], is small: it does not exceed 5%. The average ∆2 is at least 6 times smaller than the baseline ∆0 ≈ 30%. Controlling for sentence length (grouping sentences by length): Average over group averages of ∆2: it does not exceed 4.3%. The average ∆2 is at least 7 times smaller than the baseline error, again ∆0 ≈ 30%. The minimum size of a group is one sentence; the qualitative results are very similar if the minimum size is set to 2. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 10. Introduction Crossing theory Evaluation Concluding remarks principle of dependency length minimization [Ferrer-i-Cancho, 2015] ↓ dependency lengths ↓ the actual number of crossings in sentences To explain the low frequency of crossings in world languages, it many not be necessary to recur to recur to A ban of crossings by grammar (e.g., [Hudson, 2007, Tanaka, 1997]). A principle of minimization of crossings [Liu, 2008] A competence-plus [Hurford, 2012] limiting the number of crossings. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 11. Introduction Crossing theory Evaluation de Marnee, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., and Manning, C. D. (2014). Universal Stanford dependencies: a cross-linguistic typology. In Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland. European Language Resources Association (ELRA). Ferrer-i-Cancho, R. (2013). Random crossings in dependency trees. http://arxiv.org/abs/1305.4561. Ferrer-i-Cancho, R. (2014). A stronger null hypothesis for crossing dependencies. Europhysics Letters, 108:58003. Ferrer-i-Cancho, R. (2015). C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 12. Introduction Crossing theory Evaluation The placement of the head that minimizes online memory. A complex systems approach. Language Dynamics and Change, 5:141164. Ferrer-i-Cancho, R. and Gómez-Rodríguez, C. (2016). Liberating language research from dogmas of the 20th century. Glottometrics, 33:3334. Gómez-Rodríguez, C. and Ferrer-i-Cancho, R. (2016). The scarcity of crossing dependencies: a direct outcome of a specic constraint? http://arxiv.org/abs/1601.03210. Haji£, J., Panevová, J., Haji£ová, E., Panevová, J., Sgall, P., Pajas, P., ’tÂ¥pánek, J., Havelka, J., and Mikulová, M. (2006). Prague Dependency Treebank 2.0. CDROM CAT: LDC2006T01, ISBN 1-58563-370-4. Linguistic Data Consortium. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 13. Introduction Crossing theory Evaluation Hudson, R. A. (2007). Language Networks: The New Word Grammar. Oxford University Press. Hurford, J. R. (2012). Chapter 3. Syntax in the Light of Evolution, pages 175258. Oxford University Press, Oxford. Liu, H. (2008). Dependency distance as a metric of language comprehension diculty. Journal of Cognitive Science, 9:159191. Rosa, R., MaÅ¡ek, J., Marecek, D., Popel, M., Zeman, D., and Žabokrtský, Z. (2014). HamleDT 2.0: Thirty dependency treebanks stanfordized. In Chair), N. C. C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a
  • 14. Introduction Crossing theory Evaluation S., editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland. European Language Resources Association (ELRA). Tanaka, H. (1997). Invisible movement in sika-nai and the linear crossing constraint. Journal of East Asian Linguistics, 6:143188. Zeman, D., Du²ek, O., Mare£ek, D., Popel, M., Ramasamy, L., ’tÂ¥pánek, J., Å¡abokrtský, Z., and Haji£, J. (2014). HamleDT: Harmonized multi-language dependency treebank. Language Resources and Evaluation, 48(4):601637. C. Gómez-Rodríguez R. Ferrer-i-Cancho The scarcity of crossing dependencies: a direct outcome of a