際際滷

際際滷Share a Scribd company logo
PART 1




  Deep Parsing

          Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
SBARQ - ?
            Craig Trim / craigtrim@gmail.com / CCA 3.0
SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question
                                                                   Craig Trim / craigtrim@gmail.com / CCA 3.0
SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb
NP = Noun Phrase, VP = Verb Phrase                                               Craig Trim / craigtrim@gmail.com / CCA 3.0
SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb
                                                                                     Craig Trim / craigtrim@gmail.com / CCA 3.0
NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun
SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb
                                                                                     Craig Trim / craigtrim@gmail.com / CCA 3.0
NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun
IN = Preposition
SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb
                                                                                     Craig Trim / craigtrim@gmail.com / CCA 3.0
NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun
IN = Preposition, NNS = Singular Noun
Structural components highlighted.
                                     Craig Trim / craigtrim@gmail.com / CCA 3.0
Part-of-Speech tags highlighted.
                                   Craig Trim / craigtrim@gmail.com / CCA 3.0
Tokens highlighted.
                      Craig Trim / craigtrim@gmail.com / CCA 3.0
User input (sentence) highlighted.
                                     Craig Trim / craigtrim@gmail.com / CCA 3.0
Focus on noun phrases (NP).
                              Craig Trim / craigtrim@gmail.com / CCA 3.0
Find the connecting prepositional phrases (PP).
                                                  Craig Trim / craigtrim@gmail.com / CCA 3.0
Highlight segment of sentence to extract.
                                            Craig Trim / craigtrim@gmail.com / CCA 3.0
Perform extraction.
                      Craig Trim / craigtrim@gmail.com / CCA 3.0
Peform extraction.
                     Craig Trim / craigtrim@gmail.com / CCA 3.0
Create a semantic chain (collection of  2 triples).
                                                       Craig Trim / craigtrim@gmail.com / CCA 3.0
Compare semantic chain to parse tree structure.
                                                  Craig Trim / craigtrim@gmail.com / CCA 3.0
Compare semantic chain to parse tree structure.
                                                  Craig Trim / craigtrim@gmail.com / CCA 3.0
Compare semantic chain to parse tree structure.
                                                  Craig Trim / craigtrim@gmail.com / CCA 3.0
Compare semantic chain to parse tree structure.
                                                  Craig Trim / craigtrim@gmail.com / CCA 3.0
Normalize semantic chain.
                            Craig Trim / craigtrim@gmail.com / CCA 3.0
Add additional semantic context.
                                   Craig Trim / craigtrim@gmail.com / CCA 3.0
Add additional semantic context.
                                   Craig Trim / craigtrim@gmail.com / CCA 3.0
Add additional semantic context.
                                   Craig Trim / craigtrim@gmail.com / CCA 3.0
Add additional semantic context.
                                   Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
PART 2




   The Parsing Process



                Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0
Craig Trim / craigtrim@gmail.com / CCA 3.0

More Related Content

Deep Parsing (2012)

  • 1. PART 1 Deep Parsing Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 2. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 3. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 4. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 5. SBARQ - ? Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 6. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 7. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb NP = Noun Phrase, VP = Verb Phrase Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 8. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb Craig Trim / craigtrim@gmail.com / CCA 3.0 NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun
  • 9. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb Craig Trim / craigtrim@gmail.com / CCA 3.0 NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun IN = Preposition
  • 10. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb Craig Trim / craigtrim@gmail.com / CCA 3.0 NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun IN = Preposition, NNS = Singular Noun
  • 11. Structural components highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 12. Part-of-Speech tags highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 13. Tokens highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 14. User input (sentence) highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 15. Focus on noun phrases (NP). Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 16. Find the connecting prepositional phrases (PP). Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 17. Highlight segment of sentence to extract. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 18. Perform extraction. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 19. Peform extraction. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 20. Create a semantic chain (collection of 2 triples). Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 21. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 22. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 23. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 24. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 25. Normalize semantic chain. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 26. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 27. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 28. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 29. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 30. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 31. PART 2 The Parsing Process Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 32. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 33. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 34. Craig Trim / craigtrim@gmail.com / CCA 3.0
  • 35. Craig Trim / craigtrim@gmail.com / CCA 3.0

Editor's Notes

  1. The first step is tokenizing the
  2. What you have here are 2 triples connected together; a semantic chain.
  3. Dont look at this diagram with the mis-conception that an ontology is a taxonomy or directed tree. Its not. Its a cyclic network. We do seem to have Software as a root node with most relationships flowing up to the parent. However, in real life, the extracted semantic chain would be one small connection in the midst of an innumerable number of nodes, some in clusters, some in sequences, some apparently random, but all connected and sometimes having multiple connections between 2 nodes and so on.
  4. . Now, youve been a good audience. Thank you. Lets look at some real code and a real process. < CLICK > (END PRESENTATION AND GO TO PART 2)
  5. < CLICK > The first step is to pre-process the input. Pre-processing means we might add or remove tokens, most often punctuation, but we could make other additions. Some degree of normalization might occur here for example an acronym that is spelled I.B.M. might be normalized to IBM or U.S.A to USA. Pattern reduction is a type of normalization it provides a higher degree of uniformity on user input and makes the job of parsing and downstream processing easier. There are simply less variations to account for. However, we generally want to keep pre-processing short and sweet, depending on the needs of our applicatoin. By pre-processing we do have a tendency to lose the user-speak; that is, how a user might choose to refer to an entity or employ nuanced constructions. Also, too much normalization can lead to inaccurate results in the parser. We dont lose anything by changing I.B.M. to IBM, but if we changed the inflected verb installed to the infinitive construction (also called cannonical form, normal form, or lemma) of install we lose the fact that the installation occurred in the past tense. < CLICK > Performing lemmatization at this stage may be appropriate for some applications, but in the main, nuanced speech leads to more accurate parsing results, which in turns leads to higher precision in extracting information of interest. Lemmatization is typically performed in the stage that follows parsing, the post processing stage. < CLICK >. Post processing is really an abstraction of many many many services services that perform not only lemmatization (which is conceptually trivial), but semantic interpolation the adding of additional meaning to the parse tree, as we saw on previous slides. < CLICK >
  6. However, at a high level, this is what happens. The input is pre-processed, parsed, and post-processed. < CLICK >
  7. Lets add a little more context. The user provides input, the input is received, goes through the process we just talked about, and the insight (hopefully there is some) is provided back to the user. The important thing on this diagram is the Intermediate Form. How is the user input represented as it flows through this process? At its simplest, a data transfer object msut exist tha represents the initial input as a String, converts the String into an array of tokens, parses the tokens and stores the structured parse results, and has a mechanism for allowing the structurd output to be enhanced (or simplified) through a number of services, and finally for additional context to be applied and brought to bear upon these results. The design for intermediate representation lies at the heart of every parsing strategy. There are multiple strategies available today. These may vary by architecture, design principle or needs of the application. A parsing strategy that only leverages part of speech tagging is not likely to require a mechanism for storing deep parse results and the additional complexity this incurs. On the other hand, an architecture that can allow a parsing process the simplicity of a few steps, or the complexity of several hundred steps, and be customized without compromise to original design principles is of the most value. Of the many architectures that exist, there are yet many that are this well designed. Ultimiately the strategy you choose will be based on a variety of factors. I do identify this choice as being one of the the most important considerations in the parsing process.