際際滷

際際滷Share a Scribd company logo
POS Integration in
Moses
Background
Englis
h
Tamil (e|f)
I 牀牀鉦牆 0.66
went 牀 牀鉦牆牀牆 0.54
went 牀朽牆牀牆牀牆 0.13
to stall 牀牀牆牀牆牀牆 0.47
to stall 牀牀牆牀牀鉦牀逗牆牀牆 0.21
I went 牀牀鉦牆 牀 牀鉦牆牀牆 0.42
w3 w1w2 score
牀牀牆牀牆牀牆 <s> 牀牀鉦牆 -1.400199
牀牀鉦牆 牀逗 牀牀鉦牆 牀牀牆牀牆牀牆 -1.855783
牀 牀鉦牆牀牆 牀牀牆牀牆牀牆 牀牀鉦牆 牀逗 -0.4191293
牀牀鉦牆 牀 牀鉦牆牀牆
牀朽牆牀牆牀牆
牀牀鉦牆
牀牀牆牀牆牀牆
牀牀牆牀牀鉦牀逗牆牀牆
牀牀牆牀牆牀牆
牀牀牆牀牀鉦牀逗牆牀牆
牀 牀鉦牆牀牆
牀牀牆牀牆牀牆
牀牀牆牀牀鉦牀逗牆牀牆
Getting Maximum
Probability
Key Challenges
 Challenge 1
 Word Reordering
 Challenge 2
 Unknown words
Using factored model to solve it
Word Reordering
 Example
I will arrive tomorrow afternoon
牀牀鉦牆 牀牀鉦牆 牀牀鉦牆 牀朽萎牀牀朽牆
Unknown words
 Example
Word house completely independently of the
word houses.
 Training data do not add any knowledge
about the translation of houses.
What is Factored model
 Redefining a word from a single symbol to a
vector of factors
Traditional Factored
Word
Factored model Example
Went
Go
Verb
Past tense
Word
Lemma
POS
Case maker
 Components of Factored translation models
 Language model
 Translation model
 Reordering model
 Translation steps
 Generation steps
 Each component defines one or more feature
functions that are combined in a log-linear model:
Factored Translation
Methodology
 Parallel Corpus comparison
Traditional Factored
Methodology
 LM Comparison
 No changes. Same as traditional method/
Methodology
 Translation model
 Prepare on training- Run POS tagger on corpus to tagged
the data
 Establish word alignment and POS tagged alignment
using GIZA++
I Went To shop
牀牀鉦牆
牀牀牆牀牆牀牆
牀 牀鉦牆牀牆
PRP V PREP NN
PRP
NN
V
Methodology
 According to the alignment of word and tag
source sentence will be reordered
 Extract phrase pairs that are consistent with
the word alignment
 Estimate scoring functions (conditional
phrase translation probabilities or lexical
translation probabilities)
Methodology
Phrase table comparison
 Traditional
 Factored
Decoding
Source phrase: boys|boy|NN|plural
 Translation: Mapping lemmas
boy  牀牀牆, 牀牆牀朽牆 etc.
 Translation: Mapping morphology
NN||plural  NN|-e, NN|-o, etc.
 Generation: Generating surface forms
牀牀牆 NN|-s  牀牀牆
牀牀牆 NN|-p  牀牀牆牀牀橿
牀牆牀朽牆 NN|-s  牀牆牀朽牆
牀牆牀朽牆 |NN|-p  牀牆牀朽牆牀牀橿
 Translation options:
牀牀牆 NN|-s  牀牀牆
牀牀牆 NN|-p  牀牀牆牀牀橿
牀牆牀朽牆 NN|-s  牀牆牀朽牆
牀牆牀朽牆 |NN|-p  牀牆牀朽牆牀牀橿
Pos Integration to MOSES

More Related Content

Pos Integration to MOSES

  • 2. Background Englis h Tamil (e|f) I 牀牀鉦牆 0.66 went 牀 牀鉦牆牀牆 0.54 went 牀朽牆牀牆牀牆 0.13 to stall 牀牀牆牀牆牀牆 0.47 to stall 牀牀牆牀牀鉦牀逗牆牀牆 0.21 I went 牀牀鉦牆 牀 牀鉦牆牀牆 0.42 w3 w1w2 score 牀牀牆牀牆牀牆 <s> 牀牀鉦牆 -1.400199 牀牀鉦牆 牀逗 牀牀鉦牆 牀牀牆牀牆牀牆 -1.855783 牀 牀鉦牆牀牆 牀牀牆牀牆牀牆 牀牀鉦牆 牀逗 -0.4191293 牀牀鉦牆 牀 牀鉦牆牀牆 牀朽牆牀牆牀牆 牀牀鉦牆 牀牀牆牀牆牀牆 牀牀牆牀牀鉦牀逗牆牀牆 牀牀牆牀牆牀牆 牀牀牆牀牀鉦牀逗牆牀牆 牀 牀鉦牆牀牆 牀牀牆牀牆牀牆 牀牀牆牀牀鉦牀逗牆牀牆 Getting Maximum Probability
  • 3. Key Challenges Challenge 1 Word Reordering Challenge 2 Unknown words Using factored model to solve it
  • 4. Word Reordering Example I will arrive tomorrow afternoon 牀牀鉦牆 牀牀鉦牆 牀牀鉦牆 牀朽萎牀牀朽牆
  • 5. Unknown words Example Word house completely independently of the word houses. Training data do not add any knowledge about the translation of houses.
  • 6. What is Factored model Redefining a word from a single symbol to a vector of factors Traditional Factored Word
  • 7. Factored model Example Went Go Verb Past tense Word Lemma POS Case maker
  • 8. Components of Factored translation models Language model Translation model Reordering model Translation steps Generation steps Each component defines one or more feature functions that are combined in a log-linear model: Factored Translation
  • 9. Methodology Parallel Corpus comparison Traditional Factored
  • 10. Methodology LM Comparison No changes. Same as traditional method/
  • 11. Methodology Translation model Prepare on training- Run POS tagger on corpus to tagged the data Establish word alignment and POS tagged alignment using GIZA++ I Went To shop 牀牀鉦牆 牀牀牆牀牆牀牆 牀 牀鉦牆牀牆 PRP V PREP NN PRP NN V
  • 12. Methodology According to the alignment of word and tag source sentence will be reordered Extract phrase pairs that are consistent with the word alignment Estimate scoring functions (conditional phrase translation probabilities or lexical translation probabilities)
  • 13. Methodology Phrase table comparison Traditional Factored
  • 14. Decoding Source phrase: boys|boy|NN|plural Translation: Mapping lemmas boy 牀牀牆, 牀牆牀朽牆 etc. Translation: Mapping morphology NN||plural NN|-e, NN|-o, etc. Generation: Generating surface forms 牀牀牆 NN|-s 牀牀牆 牀牀牆 NN|-p 牀牀牆牀牀橿 牀牆牀朽牆 NN|-s 牀牆牀朽牆 牀牆牀朽牆 |NN|-p 牀牆牀朽牆牀牀橿 Translation options: 牀牀牆 NN|-s 牀牀牆 牀牀牆 NN|-p 牀牀牆牀牀橿 牀牆牀朽牆 NN|-s 牀牆牀朽牆 牀牆牀朽牆 |NN|-p 牀牆牀朽牆牀牀橿