際際滷

際際滷Share a Scribd company logo
Improvement of English to Persian
Machine Translation via N-grams of
Part-of-Speech tags
Adel Rahimi
Sharif University Of Technlogy
adel.rahimi@mehr.sharif.edu
3rd Regional Conference On New Achievements In Electrical And Computer Engineering
Hi! Im Adel Rahimi
I work at Sharif Speech and Language
Processing Lab.
I love NLP and Data Mining.
You can find me at:
http://mehr.sharif.edu/~adel.rahimi
Adel.rahimi@mehr.sharif.edu
2
IN SHORT Machine Translation has always been an interesting topic in
the NLP.
Its always improving, we tried a new method to align the
English to Persian machine-translated texts. We used n-gram
modelling for part-of-speech tagged tokens. This method
improved the accuracy for syntactical mistranslated sentences.
3
PREVIOUS
STUDIES
Orch (1999) used a method that translated word by
word and then reordered words as the destination
languages syntactic structure
Koehn (2009) proposed that we translate phrases
regardless of word structures
Kumar & Byrne (2008), Blackwell (2006), and
Kumar (2003) all were looking for a method to use
Finite State Transducer
4
HOW WAS IT DONE?
METHODOLOGY We used N-gram of POS tagged items:
悋擧惆 悽悋
pronoun pronoun noun conjunction pronoun verb
悽悋惘惠
pronoun verb
6
THE DATASET
7
String
n n pro spec
n n pro qua spec n
n p n p v adv
n pro p adv v pro
p n adj adj n
number
霸
朮
朿
朶
杁
8
HOW ABOUT THE ACCURACY?
9
悋惘愕 悋惶 悴擧 悋惠惘擧悋愕惠 惠惆悋 惡愕悋惘
悋擯愕 悋惶 悴This is a very common meteric
愆惆 惠惘悴 悴悋愕惠 惠惆悋 惡愕悋惘 悋 惠惘擧 擧
愆惆惠惘悴 擧悋 悋悴慍悋 惆惡悋n n pro adj adj v
惆惡悋愆惆 悋惶悋忰 擧悋 悋悴慍悋pro n n adj adj v
10
65 percent accuracy
11
THANKS Any questions?
Contact me at:
 Mehr.sharif.edu/~adel.rahimi
 Adel.rahimi@mehr.sharif.edu

More Related Content

Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags

  • 1. Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags Adel Rahimi Sharif University Of Technlogy adel.rahimi@mehr.sharif.edu 3rd Regional Conference On New Achievements In Electrical And Computer Engineering
  • 2. Hi! Im Adel Rahimi I work at Sharif Speech and Language Processing Lab. I love NLP and Data Mining. You can find me at: http://mehr.sharif.edu/~adel.rahimi Adel.rahimi@mehr.sharif.edu 2
  • 3. IN SHORT Machine Translation has always been an interesting topic in the NLP. Its always improving, we tried a new method to align the English to Persian machine-translated texts. We used n-gram modelling for part-of-speech tagged tokens. This method improved the accuracy for syntactical mistranslated sentences. 3
  • 4. PREVIOUS STUDIES Orch (1999) used a method that translated word by word and then reordered words as the destination languages syntactic structure Koehn (2009) proposed that we translate phrases regardless of word structures Kumar & Byrne (2008), Blackwell (2006), and Kumar (2003) all were looking for a method to use Finite State Transducer 4
  • 5. HOW WAS IT DONE?
  • 6. METHODOLOGY We used N-gram of POS tagged items: 悋擧惆 悽悋 pronoun pronoun noun conjunction pronoun verb 悽悋惘惠 pronoun verb 6
  • 7. THE DATASET 7 String n n pro spec n n pro qua spec n n p n p v adv n pro p adv v pro p n adj adj n number 霸 朮 朿 朶 杁
  • 8. 8 HOW ABOUT THE ACCURACY?
  • 9. 9 悋惘愕 悋惶 悴擧 悋惠惘擧悋愕惠 惠惆悋 惡愕悋惘 悋擯愕 悋惶 悴This is a very common meteric 愆惆 惠惘悴 悴悋愕惠 惠惆悋 惡愕悋惘 悋 惠惘擧 擧 愆惆惠惘悴 擧悋 悋悴慍悋 惆惡悋n n pro adj adj v 惆惡悋愆惆 悋惶悋忰 擧悋 悋悴慍悋pro n n adj adj v
  • 11. 11 THANKS Any questions? Contact me at: Mehr.sharif.edu/~adel.rahimi Adel.rahimi@mehr.sharif.edu