際際滷

際際滷Share a Scribd company logo
Arabic Morphology Using Only Finite-State OperationsSupervisor: Dr. A. R. WeerasingheSivaneasharajahLushanthan2006/CS/154
?
?
Author  Kenneth R Beesley
Introducing MorphologyMorphology     Structure of words and how words are formedMorphemeThe smallest linguistic unit within a word that can carry a meaning, such as "un-", "break", and "-able" in the word "unbreakableMorphotactics     The ordering restrictions in place on the ordering of morphemesOrthographic/Variation Rule     Models the changes that occur in a word usually when two morphemes combine (Spelling Rules)
What  If?lexical formANALYSINGGENERATINGsurface form
Why Morphological Analyzer?Grammar CheckerText Summarizer TTSMachine TranslationData Retrieval
To Do A Morphological ParsingLexiconList of Morphemes (stem+ affixes)POS information of morphemesMorphotacticsOrthographic Rules
A Finite State Transducer  Cola MachineALPHABET		- {F},{T}WORDS 			- {FFF}, {FT}, {TF} LANGUAGE 		- {FFF, FT, TF} FFF051015TT
FS Languages & Natural LanguagesA Network that accepts One-Word LanguageA two level transducerbaelttbael+Noun+ Pltbael竜s
Writing Regular Expressions - Lexicon[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];a:a = a{kick} = [ k:k i:i c:c k:k ] = [ k i c k ]word+Verb+ Caseword竜suffix
Possible wordsSolution?[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];Another layer!
Writing Regular Expressions - Rules留 -> 硫 || 粒 _ 隆  is read as 留 is rewritten as 硫 between 粒 and 隆 [y -> i e || Cons _ s .#.,, y -> i || Cons _ e d .#. ] .o.e -> 0 || Cons _ e d .#. ;
In The Paper,Discontiguous dependencies between morphemes in a word  FilteringNon-concatinativemorphotacticsReduplicationSemitic interdigitationVariation rules
Filtering Out Over-GenerationArt+word+Noun+Indef+Case?* %+ Art %+ ?* %+ Indef ?*$ [ %+ Art %+ ?* %+ Indef ]Prep+word+Noun+Def/Indef+Nom/Acc$ [%+ Prep %+ ?* [%+Acc | %+Nom]]        $ [ %+ Art %+ ?* %+ Indef ]        $ [%+ Prep %+ ?* [%+Acc | %+Nom]] ~[|]
Non- ConcatenativeMorphotacticsSemitic stem interdigitation		Root  ktb, drs		Template - CVCVC		Vocalization  ui, a*K         t         bCV C V Cu        iK   u   t    i   b^[{ktb}.m>.{CVCVC}.<m.[u*i]^]Root tierTemplate tierVocalization tierStem tier
The Current System4930 words72,000,000 abstract fully-voweled wordsSixty six finite state variation rulesNew-words added easily into the lexical database
Arabic Morphology Using Only Finite State Operations -Review
Discussion
Thought For The DayNever say No for Education!

More Related Content

Arabic Morphology Using Only Finite State Operations -Review

  • 1. Arabic Morphology Using Only Finite-State OperationsSupervisor: Dr. A. R. WeerasingheSivaneasharajahLushanthan2006/CS/154
  • 2. ?
  • 3. ?
  • 4. Author Kenneth R Beesley
  • 5. Introducing MorphologyMorphology Structure of words and how words are formedMorphemeThe smallest linguistic unit within a word that can carry a meaning, such as "un-", "break", and "-able" in the word "unbreakableMorphotactics The ordering restrictions in place on the ordering of morphemesOrthographic/Variation Rule Models the changes that occur in a word usually when two morphemes combine (Spelling Rules)
  • 6. What If?lexical formANALYSINGGENERATINGsurface form
  • 7. Why Morphological Analyzer?Grammar CheckerText Summarizer TTSMachine TranslationData Retrieval
  • 8. To Do A Morphological ParsingLexiconList of Morphemes (stem+ affixes)POS information of morphemesMorphotacticsOrthographic Rules
  • 9. A Finite State Transducer Cola MachineALPHABET - {F},{T}WORDS - {FFF}, {FT}, {TF} LANGUAGE - {FFF, FT, TF} FFF051015TT
  • 10. FS Languages & Natural LanguagesA Network that accepts One-Word LanguageA two level transducerbaelttbael+Noun+ Pltbael竜s
  • 11. Writing Regular Expressions - Lexicon[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];a:a = a{kick} = [ k:k i:i c:c k:k ] = [ k i c k ]word+Verb+ Caseword竜suffix
  • 12. Possible wordsSolution?[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];Another layer!
  • 13. Writing Regular Expressions - Rules留 -> 硫 || 粒 _ 隆 is read as 留 is rewritten as 硫 between 粒 and 隆 [y -> i e || Cons _ s .#.,, y -> i || Cons _ e d .#. ] .o.e -> 0 || Cons _ e d .#. ;
  • 14. In The Paper,Discontiguous dependencies between morphemes in a word FilteringNon-concatinativemorphotacticsReduplicationSemitic interdigitationVariation rules
  • 15. Filtering Out Over-GenerationArt+word+Noun+Indef+Case?* %+ Art %+ ?* %+ Indef ?*$ [ %+ Art %+ ?* %+ Indef ]Prep+word+Noun+Def/Indef+Nom/Acc$ [%+ Prep %+ ?* [%+Acc | %+Nom]] $ [ %+ Art %+ ?* %+ Indef ] $ [%+ Prep %+ ?* [%+Acc | %+Nom]] ~[|]
  • 16. Non- ConcatenativeMorphotacticsSemitic stem interdigitation Root ktb, drs Template - CVCVC Vocalization ui, a*K t bCV C V Cu iK u t i b^[{ktb}.m>.{CVCVC}.<m.[u*i]^]Root tierTemplate tierVocalization tierStem tier
  • 17. The Current System4930 words72,000,000 abstract fully-voweled wordsSixty six finite state variation rulesNew-words added easily into the lexical database
  • 20. Thought For The DayNever say No for Education!