This short paper describes the use of the linguistic annotation available in parallel PropBanks (Chinese and English) for the enhancement of automatically derived word alignments. Speci鍖cally, we suggest ways to re鍖ne and expand word alignments for verb-predicates by using predicate-argument structures. Evaluations demonstrate improved alignment accuracies that vary by corpus type.
1 of 14
Downloaded 14 times
More Related Content
Using Parallel Propbanks to Enhance Word-alignments
1. Using Parallel Propbanks to
Enhance Word-Alignments
The 3rd Linguistic Annotation Workshop at ACL 09
August 7th, 2009
Jinho D. Choi (Univ. of Colorado at Boulder)
Martha Palmer (Univ. of Colorado at Boulder)
Niawen Xue (Brandeis University)
2. Parallel Propbanks
Propbank
- Corpus annotated with verbal propositions and their
arguments (semantic roles)
[ Gansu Province] also actively [ explored ] [ high risk business]
Arg0: explorer Arg1: things explored
Parallel Propbanks
- Propbanks annotated in parallel corpus
[!!"] " # [ #$ ] [% $% &']
Arg0 Arg1
2
3. Word-Alignments
Given parallel sentences, discover translation for each
word
!" # ! $" % & # '( $% )&
Construction is a principal economic activity in developing Pudong
GIZA++: a statistical machine translation toolkit
- It is hard to verify if the alignments are correct.
- Words with low frequencies may not get aligned.
- It does not account for semantics.
3
4. Predicate Matching (based on GIZA++)
English Chinese Parallel Treebank (ECTB)
- Xinhua: Chinese newswire + literal translation
- Sinorama: Chinese news magazine + non-literal translation
Xinhua: 12,895 Sinorama: 40,086
19%
32%
En.verb
45% En.be 3%
En.else 56%
En.none 22%
19% 3%
6
5. Top-down Argument Matching
Verify word-alignments
- For each Chinese verb vc aligned to some English verb ve
- Verify that the alignment is correct if the arguments of
vc and ve match
Arg0 ArgM ArgM Rel Arg1
[ !!" ] [ " ] [ # ] [ #$ ] [ % $% &' ]
[Gansu Province ][ also][ actively] [explored ][ high risk business ]
Arg0 ArgM ArgM Rel Arg1
Bingo!
7
6. Bottom-up Argument Matching
Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes
the argument matching with vc
Arg0 A.M A.M A.M Arg1 Rel
[ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ ']
[ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ]
][
Arg0 A.M A.M Rel Arg1
8
7. Bottom-up Argument Matching
Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes
the argument matching with vc
ArgM Rel Arg1
[Foreign ][ funded ][enterprises]in Gansu Province no longer worry about investment risk
[ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ ']
Arg0 A.M A.M A.M Arg1 Rel
[ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ]
][
Arg0 A.M A.M Rel Arg1
8
8. Argument Matching Score
Macro argument matching score
Micro argument matching score
Thresholds
- Top-down: thresholds on macro score
- Bottom-up: thresholds on both macro and micro scores
9
9. System Overview
Source Language Target Language
Corpus Corpus
GIZA++
Word
Verbs aligned Alignments Verbs aligned
to verbs to no word
Parallel
Top-down Propbanks Bottom-up
Matching Matching
Veri鍖ed Expanded
Alignments Alignments
Enhanced
Alignments
10
10. Evaluations
Test Corpus
- NIST-GALE Web Genre Test Data
- 100 parallel sentences, 365 verb tokens, 273 verb types
Measurements
- Term Coverage
: how many Chinese verb-types are covered
- Term Expansion
: how many English verb-types are suggested
- Alignment Accuracy
: how many suggested English verb-types are correct
11
13. Conclusions & Future Work
Conclusions
- Top-down Argument Matching is most effective for verifying
word-alignments based on non-literal translations that have
proven dif鍖cult for GIZA++.
- Bottom-up Argument Matching shows promise for expanding
the coverage of GIZA++ alignments based on literal
translations.
We will try to enhance word-alignments by using
- Automatically labeled Propbanks
- Nombanks, Named-entity tags
- Parallel Propbanks prior to GIZA++
14
14. Acknowledgements
We gratefully acknowledge the support of the National
Science Foundation Grants IIS-0325646, Domain
Independent Semantic Parsing, CISE-CRI-0551615,
Towards a Comprehensive Linguistic Annotation, and a
grant from the Defense Advanced Research Projects
Agency (DARPA/IPTO) under the GALE program,
DARPA/CMO Contract No. HR0011-06-C-0022,
subcontract from BBN, Inc.
Special thanks to Daniel Gildea, Ding Liu (University of
Rochester) who provided word-alignments, Wei Wang
(Information Sciences Institute at University of Southern
California) who provided the test-corpus, and Hua
Zhong (University of Colorado at Boulder) who
performed the evaluations.
15