Deep Parsing (2012)

Jan 18, 2013Download as PPT, PDF4 likes8,161 views

The document discusses deep parsing and the parsing process. It describes parsing a sample sentence by highlighting its structural components, part-of-speech tags, and tokens. It also covers extracting segments, creating a semantic chain, normalizing the chain, and adding additional semantic context. The goal is to compare the semantic chain to the parse tree structure.

PART 1

Deep Parsing

Craig Trim / craigtrim@gmail.com / CCA 3.0

Craig Trim / craigtrim@gmail.com / CCA 3.0

SBARQ - ?
Craig Trim / craigtrim@gmail.com / CCA 3.0

SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question
Craig Trim / craigtrim@gmail.com / CCA 3.0

SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb
NP = Noun Phrase, VP = Verb Phrase Craig Trim / craigtrim@gmail.com / CCA 3.0

SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb
Craig Trim / craigtrim@gmail.com / CCA 3.0
NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun

Structural components highlighted.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Part-of-Speech tags highlighted.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Tokens highlighted.
Craig Trim / craigtrim@gmail.com / CCA 3.0

User input (sentence) highlighted.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Focus on noun phrases (NP).
Craig Trim / craigtrim@gmail.com / CCA 3.0

Find the connecting prepositional phrases (PP).
Craig Trim / craigtrim@gmail.com / CCA 3.0

Highlight segment of sentence to extract.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Perform extraction.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Peform extraction.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Create a semantic chain (collection of ≥ 2 triples).
Craig Trim / craigtrim@gmail.com / CCA 3.0

Compare semantic chain to parse tree structure.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Normalize semantic chain.
Craig Trim / craigtrim@gmail.com / CCA 3.0

Add additional semantic context.
Craig Trim / craigtrim@gmail.com / CCA 3.0

PART 2

The Parsing Process

Craig Trim / craigtrim@gmail.com / CCA 3.0

This document provides an overview of sentence structure and analysis. It defines a sentence as being made up of two main phrases: a noun phrase and a verb phrase. Each phrase can consist of a single word or multiple words. Sentences can also be analyzed as having two main parts: a subject and a predicate. Reed-Kellogg diagrams are introduced as a way to visually represent sentence structure, showing the relationship between subjects and predicates. Examples are given of simple sentences diagrammed in this way, and how modifiers are added to the diagrams.

LOGICAL CONNETIVES USE DISCRETE MATHSMATICS ASSINGMENTSELF EMPLOY

��

This document discusses logical connectives and their use in natural and formal languages. It defines logical connectives as symbols that connect propositions such that the truth value of the resulting statement depends only on the input propositions and the connective. In natural language, words like 'and' and 'or' can serve as logical connectives, but words like 'so' are not always logical. There are 5 basic logical connectives - negation, conjunction, disjunction, conditional, and biconditional - along with their truth tables and examples. Logical connectives allow for logical reasoning and have applications in computer science.

e0206a1c9e2g8

��

Text classification presentationMarijn van Zelst

��

The document discusses text classification and different techniques for performing classification on text data, including dimensionality reduction, text embedding, and classification pipelines. It describes using dimensionality reduction techniques like TSNE to visualize high-dimensional text data in 2D and how this can aid classification. Text embedding techniques like doc2vec are discussed for converting text into fixed-dimensional vectors before classification. Several examples show doc2vec outperforming classification directly on word counts. The document concludes that extracting the right features from data is key and visualization can provide insight into feature quality.

Boyer–Moore string search algorithmHamid Shekarforoush

��

The document describes the Boyer-Moore string search algorithm, which improves on the naive string matching algorithm. It uses two rules - the bad character rule and good suffix rule - to skip unnecessary character comparisons, making string searches more efficient. The bad character rule uses a table to determine how far to shift the pattern when a mismatch occurs, while the good suffix rule allows reusing matches when they are found. Together these rules allow Boyer-Moore to significantly outperform the naive algorithm.

Publishing Python to PyPI using Github Actions.pptxCraig Trim

��

This presentation provides a straightforward guide to publishing Python projects on PyPI using GitHub Actions. It's a practical walkthrough for developers on automating the release process of their Python packages. You'll learn how to set up a PyPI token, configure GitHub workflows, and push updates that trigger automatic package deployment. This resource is for anyone looking to eliminate manual uploads to PyPI with a straightforward approach to using GitHub's tools for continuous integration and deployment.

Ontologies and the Semantic WebCraig Trim

��

This document provides an overview of ontologies and the semantic web. It discusses key concepts like triples, ontologies, and methodologies for creating semantic networks. It covers topics such as ontology design, the SPARQL query language, and inferencing. Examples are given around representing information about Shakespeare and his play Hamlet as triples. The document also includes appendices on resources like the Web Ontology Language and common ontology editors and triple stores.

SAS Visual Process FlowsCraig Trim

��

This document provides instructions for importing CSV files and running basic analyses in SAS University Edition (UE) using a visual palette. It describes how to upload CSV files from a local computer, import them into SAS UE by dragging the files onto the visual palette, clean and rename the imported dataset, and link the imported dataset to a PROC FREQ analysis to view frequencies of key variables. The document notes that the method for selecting the visual palette changed in SAS UE version 3.6 and provides guidance for transferring data out of the SAS UE Toolwire environment.

SAS University Edition - Getting StartedCraig Trim

��

This document provides instructions for setting up and using SAS University Edition on a virtual machine. It outlines downloading and installing VirtualBox, creating a SAS web account, importing and launching the SAS UE virtual appliance in VirtualBox, accessing SAS training modules online, and writing and running SAS code by defining a library and referencing training data files. Troubleshooting tips are included, such as deleting and recreating the virtual machine if errors occur.

Bluemix NL Classifier TutorialCraig Trim

��

The document provides instructions for configuring a question type classifier on IBM Bluemix. It describes 15 common question types, how to prepare question data by formatting it as text-class pairs in a CSV file, and a 5-step process for deploying and training a classifier on Bluemix: 1) Access the service, 2) Add the service, 3) Access the toolkit, 4) Deploy the app, 5) View and use the trained classifier. Optional steps describe renaming the service, adding routes, and Java code for classification.

Bluemix - Deploying a Java Web ApplicationCraig Trim

��

IBM Bluemix - Building a Project with MavenCraig Trim

��

This document describes how to use Maven, Jenkins, and Docker to build and deploy Java projects to Bluemix. It explains that Maven is used to specify build procedures, Jenkins can be configured to automatically build projects using Maven when code is committed to Git, and Docker is used to provision transient Jenkins slaves to run builds. However, using transient Docker containers means the Maven repository is also transient, which is problematic as artifacts cannot be shared between builds. The document suggests publishing artifacts to an external Maven repository to overcome this limitation.

Question Types in Natural Language ProcessingCraig Trim

��

The document discusses different types of questions, including knowledge deficit questions, common ground questions to establish shared understanding, and social coordination questions that indirectly request actions. It also covers assumptions behind questions, categories of questions like verification and definition, and dimensions like the information sources and cognitive processes involved in asking and answering questions. Answering questions is challenging as it requires knowledge of the world, tasks, inference, users, language, and discourse pragmatics. Language use depends on cognition and user intent rather than just referring to objects.

Jenkins on DockerCraig Trim

��

This document discusses extending the official Jenkins Docker image to include Maven. It describes building a Docker image called "craig/jenkins" that installs Maven, exposes Jenkins on port 8040, and mounts a host directory for configuration. Instructions are provided for configuring Maven and installing the Git plugin in Jenkins. The goal is to have a Dockerized Jenkins setup with Maven and Git support for continuous integration.

IBM Bluemix: Creating a Git ProjectCraig Trim

��

The document provides instructions for creating a project in Bluemix, syncing it with GitHub, and inviting friends. It explains how to log in to Bluemix and Jazz Hub, create a new private project to store Dockerfiles without deploying the application, view the project overview which resembles a GitHub view, set up Git to commit and pull files, and invite members by clicking on the Members section.

Things and strings publicCraig Trim

��

The document compares triple stores and property graphs. Triple stores follow the W3C standard and have stable structure but can have overhead from reification. Property graphs do not have a standard yet but allow flexible things and strings as nodes and edges without reification overhead through coding by convention. While easy to write, query and understand, property graphs are subject to interpretation errors in tracing provenance chains.

Octave - Prototyping Machine Learning AlgorithmsCraig Trim

��

Octave is a high-level language suitable for prototyping learning algorithms. Octave is primarily intended for numerical computations and provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The syntax is matrix-based and provides various functions for matrix operations. This tool has been in active development for over 20 years.

PROV OverviewCraig Trim

��

This document defines key terms used in the PROV ontology for describing the provenance of entities and activities on the web. It defines agents as things that bear responsibility, entities as physical or conceptual things, and activities as things that occur over time and involve entities. It then defines relationships between these terms, such as an activity being associated with an agent, an entity being generated by an activity, or one activity being informed by another.

The OnomyiconCraig Trim

��

The document defines and provides examples of various types of terms related to names (onomastics), including: - Acronyms - pronounceable words formed from letter initials of other words. - Anthroponyms - names of people. - Toponyms - names of places. - Ethnonyms - names of ethnic groups. It discusses sub-categories such as acronyms that have taken on identities as words (anacronyms), names based on a person's father (patronymys), and words that have opposite meanings (antonyms). Over 40 different onomastic terms are defined in the document.

Inference using owl 2.0 semanticsCraig Trim

��

This document discusses properties in ontology and logical reasoning, including transitive, symmetric, reflexive, and other properties. It provides examples of how different property types allow for certain inferences between related and unrelated nodes. It also discusses part-whole relations and reasoning with multiple hierarchical properties. References are listed on ontology modeling and OWL 2.0 specifications from the W3C.

An Introduction to the Jena APICraig Trim

��

This document provides an introduction to working with triples and datasets using the Jena API in Java. It discusses how to [1] create a dataset and named model, [2] create resources, properties, and literals that compose a triple, and [3] connect the triple components by adding them to a model. It also briefly covers serializing the model contents to file in different RDF formats like RDF/XML, TTL, and N-Triple. The goal is to demonstrate the basic steps for building, querying, and persisting RDF triples with Jena.

The art of tokenizationCraig Trim

��

The document discusses various aspects of tokenization including: - Tokenization is the process of segmenting text into words and sentences. It is not as simple as just splitting on whitespace, as multi-word tokens exist. - Several tokenizers are compared on examples, showing the need for techniques beyond just whitespace to properly tokenize text. - Considerations for tokenizing punctuation, numbers, contractions and acronyms are discussed. The impact of tokenization on downstream NLP tasks like parsing is also covered. Accurate tokenization requires techniques like morphological analysis and named entity recognition.

Ontology and semantic web (2016)Craig Trim

��

Endpoint Backup: 3 Reasons MSPs Ignore ItMSP360

��

Technology use over time and its impact on consumers and businesses.pptxkaylagaze

��

MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND CTI

��

Field Device Management Market Report 2030 - TechSci ResearchVipin Mishra

��

The Global Field Device Management (FDM) Market is expected to experience significant growth in the forecast period from 2026 to 2030, driven by the integration of advanced technologies aimed at improving industrial operations. 📊 According to TechSci Research, the Global Field Device Management Market was valued at USD 1,506.34 million in 2023 and is anticipated to grow at a CAGR of 6.72% through 2030. FDM plays a vital role in the centralized oversight and optimization of industrial field devices, including sensors, actuators, and controllers. Key tasks managed under FDM include: Configuration Monitoring Diagnostics Maintenance Performance optimization FDM solutions offer a comprehensive platform for real-time data collection, analysis, and decision-making, enabling: Proactive maintenance Predictive analytics Remote monitoring By streamlining operations and ensuring compliance, FDM enhances operational efficiency, reduces downtime, and improves asset reliability, ultimately leading to greater performance in industrial processes. FDM’s emphasis on predictive maintenance is particularly important in ensuring the long-term sustainability and success of industrial operations. For more information, explore the full report: https://shorturl.at/EJnzR Major companies operating in Global��Field Device Management Market are: General Electric Co Siemens AG ABB Ltd Emerson Electric Co Aveva Group Ltd Schneider Electric SE STMicroelectronics Inc Techno Systems Inc Semiconductor Components Industries LLC International Business Machines Corporation (IBM) #FieldDeviceManagement #IndustrialAutomation #PredictiveMaintenance #TechInnovation #IndustrialEfficiency #RemoteMonitoring #TechAdvancements #MarketGrowth #OperationalExcellence #SensorsAndActuators

More Related Content

More from Craig Trim (18)

Publishing Python to PyPI using Github Actions.pptxCraig Trim

��

Ontologies and the Semantic WebCraig Trim

��

SAS Visual Process FlowsCraig Trim

��

SAS University Edition - Getting StartedCraig Trim

��

Bluemix NL Classifier TutorialCraig Trim

��

Bluemix - Deploying a Java Web ApplicationCraig Trim

��

IBM Bluemix - Building a Project with MavenCraig Trim

��

Question Types in Natural Language ProcessingCraig Trim

��

Jenkins on DockerCraig Trim

��

IBM Bluemix: Creating a Git ProjectCraig Trim

��

Things and strings publicCraig Trim

��

Octave - Prototyping Machine Learning AlgorithmsCraig Trim

��

PROV OverviewCraig Trim

��

The OnomyiconCraig Trim

��

Inference using owl 2.0 semanticsCraig Trim

��

An Introduction to the Jena APICraig Trim

��

The art of tokenizationCraig Trim

��

Ontology and semantic web (2016)Craig Trim

��

Publishing Python to PyPI using Github Actions.pptxCraig Trim

��

Ontologies and the Semantic WebCraig Trim

��

SAS Visual Process FlowsCraig Trim

��

SAS University Edition - Getting StartedCraig Trim

��

Bluemix NL Classifier TutorialCraig Trim

��

Bluemix - Deploying a Java Web ApplicationCraig Trim

��

IBM Bluemix - Building a Project with MavenCraig Trim

��

Question Types in Natural Language ProcessingCraig Trim

��

Jenkins on DockerCraig Trim

��

IBM Bluemix: Creating a Git ProjectCraig Trim

��

Things and strings publicCraig Trim

��

Octave - Prototyping Machine Learning AlgorithmsCraig Trim

��

PROV OverviewCraig Trim

��

The OnomyiconCraig Trim

��

Inference using owl 2.0 semanticsCraig Trim

��

An Introduction to the Jena APICraig Trim

��

The art of tokenizationCraig Trim

��

Ontology and semantic web (2016)Craig Trim

��

Recently uploaded (20)

Endpoint Backup: 3 Reasons MSPs Ignore ItMSP360

��

Technology use over time and its impact on consumers and businesses.pptxkaylagaze

��

MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND CTI

��

Field Device Management Market Report 2030 - TechSci ResearchVipin Mishra

��

BoxLang JVM Language : The Future is DynamicOrtus Solutions, Corp

��

Just like life, our code must evolve to meet the demands of an ever-changing world. Adaptability is key in developing for the web, tablets, APIs, or serverless applications. Multi-runtime development is the future, and that future is dynamic. Enter BoxLang: Dynamic. Modular. Productive. (www.boxlang.io) BoxLang transforms development with its dynamic design, enabling developers to write expressive, functional code effortlessly. Its modular architecture ensures flexibility, allowing easy integration into your existing ecosystems. Interoperability at Its Core BoxLang boasts 100% interoperability with Java, seamlessly blending traditional and modern development practices. This opens up new possibilities for innovation and collaboration. Multi-Runtime Versatility From a compact 6MB OS binary to running on our pure Java web server, CommandBox, Jakarta EE, AWS Lambda, Microsoft Functions, WebAssembly, Android, and more, BoxLang is designed to adapt to any runtime environment. BoxLang combines modern features from CFML, Node, Ruby, Kotlin, Java, and Clojure with the familiarity of Java bytecode compilation. This makes it the go-to language for developers looking to the future while building a solid foundation. Empowering Creativity with IDE Tools Unlock your creative potential with powerful IDE tools designed for BoxLang, offering an intuitive development experience that streamlines your workflow. Join us as we redefine JVM development and step into the era of BoxLang. Welcome to the future.

Gojek Clone Multi-Service Super App.pptxV3cube

��

Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterMariaBarbaraPaglinaw

��

Unlock AI Creativity: Image Generation with DALL·EExpeed Software

��

Discover the power of AI image generation with DALL·E, an advanced AI model that transforms text prompts into stunning, high-quality visuals. This presentation explores how artificial intelligence is revolutionizing digital creativity, from graphic design to content creation and marketing. Learn about the technology behind DALL·E, its real-world applications, and how businesses can leverage AI-generated art for innovation. Whether you're a designer, developer, or marketer, this guide will help you unlock new creative possibilities with AI-driven image synthesis.

Early Adopter's Guide to AI Moderation (Preview)nick896721

��

Cloud of everything Tech of the 21 century in AviationAssem mousa

��

DevNexus - Building 10x Development Organizations.pdfJustin Reock

��

Developer Experience is Dead! Long Live Developer Experience! In this keynote-style session, we’ll take a detailed, granular look at the barriers to productivity developers face today and modern approaches for removing them. 10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’ Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method, we invent to deliver products, whether physical or virtual, we reinvent productivity philosophies to go alongside them. But which of these approaches works? DORA? SPACE? DevEx? What should we invest in and create urgency behind today so we don’t have the same discussion again in a decade?

Wondershare Filmora Crack 14.3.2.11147 Latestudkg888

��

https://ncracked.com/7961-2/ Note: >> Please copy the link and paste it into Google New Tab now Download link Free Download Wondershare Filmora 14.3.2.11147 Full Version - All-in-one home video editor to make a great video.Free Download Wondershare Filmora for Windows PC is an all-in-one home video editor with powerful functionality and a fully stacked feature set. Filmora has a simple drag-and-drop top interface, allowing you to be artistic with the story you want to create.Video Editing Simplified - Ignite Your Story. A powerful and intuitive video editing experience. Filmora 10 hash two new ways to edit: Action Cam Tool (Correct lens distortion, Clean up your audio, New speed controls) and Instant Cutter (Trim or merge clips quickly, Instant export).Filmora allows you to create projects in 4:3 or 16:9, so you can crop the videos or resize them to fit the size you want. This way, quickly converting a widescreen material to SD format is possible.

Unlocking DevOps Secuirty :Vault & KeylockHusseinMalikMammadli

��

DevOps iş təhlükəsizliyi sizi maraqlandırır? İstər developer, istər təhlükəsizlik mühəndisi, istərsə də DevOps həvəskarı olun, bu tədbir şəbəkələşmək, biliklərinizi bölüşmək və DevSecOps sahəsində ən son təcrübələri öyrənmək üçün mükəmməl fürsətdir! Bu workshopda DevOps infrastrukturlarının təhlükəsizliyini necə artırmaq barədə danışacayıq. DevOps sistemləri qurularkən avtomatlaşdırılmış, yüksək əlçatan və etibarlı olması ilə yanaşı, həm də təhlükəsizlik məsələləri nəzərə alınmalıdır. Bu səbəbdən, DevOps komandolarının təhlükəsizliyə yönəlmiş praktikalara riayət etməsi vacibdir.

Future-Proof Your Career with AI OptionsDianaGray10

��

Learn about the difference between automation, AI and agentic and ways you can harness these to further your career. In this session you will learn: Introduction to automation, AI, agentic Trends in the marketplace Take advantage of UiPath training and certification In demand skills needed to strategically position yourself to stay ahead ❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.

Stronger Together: Combining Data Quality and Governance for Confident AI & A...Precisely

��

THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIASrivaanchi Nathan

��

This business intelligence report, "The Big Ten Biopharmaceutical MNCs: Global Capability Centers in India", provides an in-depth analysis of the operations and contributions of the Global Capability Centers (GCCs) of ten leading biopharmaceutical multinational corporations in India. The report covers AstraZeneca, Bayer, Bristol Myers Squibb, GlaxoSmithKline (GSK), Novartis, Sanofi, Roche, Pfizer, Novo Nordisk, and Eli Lilly. In this report each company's GCC is profiled with details on location, workforce size, investment, and the strategic roles these centers play in global business operations, research and development, and information technology and digital innovation.

What Makes "Deep Research"? A Dive into AI AgentsZilliz

��

About this webinar: Unless you live under a rock, you will have heard about OpenAI’s release of Deep Research on Feb 2, 2025. This new product promises to revolutionize how we answer questions requiring the synthesis of large amounts of diverse information. But how does this technology work, and why is Deep Research a noticeable improvement over previous attempts? In this webinar, we will examine the concepts underpinning modern agents using our basic clone, Deep Searcher, as an example. Topics covered: Tool use Structured output Reflection Reasoning models Planning Types of agentic memory

World Information Architecture Day 2025 - UX at a CrossroadsJoshua Randall

��

User Experience stands at a crossroads: will we live up to our potential to design a better world? or will we be co-opted by “product management” or another business buzzword? Looking backwards, this talk will show how UX has repeatedly failed to create a better world, drawing on industry data from Nielsen Norman Group, Baymard, MeasuringU, WebAIM, and others. Looking forwards, this talk will argue that UX must resist hype, say no more often and collaborate less often (you read that right), and become a true profession — in order to be able to design a better world.

UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10

��

Learn what UiPath Agentic Automation capabilities are and how you can empower your agents with dynamic decision making. In this session we will cover these topics: What do we mean by Agents Components of Agents Agentic Automation capabilities What Agentic automation delivers and AI Tools Identifying Agent opportunities ❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.

Q4 2024 Earnings and Investor PresentationDropbox

��

Endpoint Backup: 3 Reasons MSPs Ignore ItMSP360

��

Technology use over time and its impact on consumers and businesses.pptxkaylagaze

��

MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND CTI

��

Field Device Management Market Report 2030 - TechSci ResearchVipin Mishra

��

BoxLang JVM Language : The Future is DynamicOrtus Solutions, Corp

��

Gojek Clone Multi-Service Super App.pptxV3cube

��

Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterMariaBarbaraPaglinaw

��

Unlock AI Creativity: Image Generation with DALL·EExpeed Software

��

Early Adopter's Guide to AI Moderation (Preview)nick896721

��

Cloud of everything Tech of the 21 century in AviationAssem mousa

��

DevNexus - Building 10x Development Organizations.pdfJustin Reock

��

Wondershare Filmora Crack 14.3.2.11147 Latestudkg888

��

Unlocking DevOps Secuirty :Vault & KeylockHusseinMalikMammadli

��

Future-Proof Your Career with AI OptionsDianaGray10

��

Stronger Together: Combining Data Quality and Governance for Confident AI & A...Precisely

��

THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIASrivaanchi Nathan

��

What Makes "Deep Research"? A Dive into AI AgentsZilliz

��

World Information Architecture Day 2025 - UX at a CrossroadsJoshua Randall

��

UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10

��

Q4 2024 Earnings and Investor PresentationDropbox

��

Deep Parsing (2012)

1. PART 1 Deep Parsing Craig Trim / craigtrim@gmail.com / CCA 3.0

2. Craig Trim / craigtrim@gmail.com / CCA 3.0

3. Craig Trim / craigtrim@gmail.com / CCA 3.0

4. Craig Trim / craigtrim@gmail.com / CCA 3.0

5. SBARQ - ? Craig Trim / craigtrim@gmail.com / CCA 3.0

6. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question Craig Trim / craigtrim@gmail.com / CCA 3.0

7. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb NP = Noun Phrase, VP = Verb Phrase Craig Trim / craigtrim@gmail.com / CCA 3.0

8. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb Craig Trim / craigtrim@gmail.com / CCA 3.0 NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun

9. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb Craig Trim / craigtrim@gmail.com / CCA 3.0 NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun IN = Preposition

10. SBARQ - ?, WHADVP = Adverb Phrase, SQ = Inverted Yes/No Question, WRB = Adverb, VBP = Present Tense Verb Craig Trim / craigtrim@gmail.com / CCA 3.0 NP = Noun Phrase, VP = Verb Phrase, PP = Prepositional Phrase, VB = Verb, PRP = Personal Pronoun IN = Preposition, NNS = Singular Noun

11. Structural components highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0

12. Part-of-Speech tags highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0

13. Tokens highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0

14. User input (sentence) highlighted. Craig Trim / craigtrim@gmail.com / CCA 3.0

15. Focus on noun phrases (NP). Craig Trim / craigtrim@gmail.com / CCA 3.0

16. Find the connecting prepositional phrases (PP). Craig Trim / craigtrim@gmail.com / CCA 3.0

17. Highlight segment of sentence to extract. Craig Trim / craigtrim@gmail.com / CCA 3.0

18. Perform extraction. Craig Trim / craigtrim@gmail.com / CCA 3.0

19. Peform extraction. Craig Trim / craigtrim@gmail.com / CCA 3.0

20. Create a semantic chain (collection of ≥ 2 triples). Craig Trim / craigtrim@gmail.com / CCA 3.0

21. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0

22. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0

23. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0

24. Compare semantic chain to parse tree structure. Craig Trim / craigtrim@gmail.com / CCA 3.0

25. Normalize semantic chain. Craig Trim / craigtrim@gmail.com / CCA 3.0

26. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0

27. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0

28. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0

29. Add additional semantic context. Craig Trim / craigtrim@gmail.com / CCA 3.0

30. Craig Trim / craigtrim@gmail.com / CCA 3.0

31. PART 2 The Parsing Process Craig Trim / craigtrim@gmail.com / CCA 3.0

32. Craig Trim / craigtrim@gmail.com / CCA 3.0

33. Craig Trim / craigtrim@gmail.com / CCA 3.0

34. Craig Trim / craigtrim@gmail.com / CCA 3.0

35. Craig Trim / craigtrim@gmail.com / CCA 3.0

Editor's Notes

#5: The first step is tokenizing the
#25: What you have here are 2 triples connected together; a semantic chain.
#29: Don’t look at this diagram with the mis-conception that an ontology is a taxonomy or directed tree. It’s not. It’s a cyclic network. We do seem to have Software as a root node with most relationships flowing up to the parent. However, in real life, the extracted semantic chain would be one small connection in the midst of an innumerable number of nodes, some in clusters, some in sequences, some apparently random, but all connected and sometimes having multiple connections between 2 nodes and so on.
#31: … . Now, you’ve been a good audience. Thank you. Let’s look at some real code and a real process. < CLICK > (END PRESENTATION AND GO TO PART 2)
#33: < CLICK > The first step is to pre-process the input. Pre-processing means we might add or remove tokens, most often punctuation, but we could make other additions. Some degree of normalization might occur here – for example an acronym that is spelled “I.B.M.” might be normalized to “IBM” or “U.S.A” to “USA”. Pattern reduction is a type of normalization – it provides a higher degree of uniformity on user input and makes the job of parsing and downstream processing easier. There are simply less variations to account for. However, we generally want to keep pre-processing short and sweet, depending on the needs of our applicatoin. By pre-processing we do have a tendency to lose the “user-speak”; that is, how a user might choose to refer to an entity or employ nuanced constructions. Also, too much normalization can lead to inaccurate results in the parser. We don’t lose anything by changing “I.B.M.” to “IBM”, but if we changed the inflected verb “installed” to the infinitive construction (also called cannonical form, normal form, or lemma) of “install” we lose the fact that the installation occurred in the past tense. < CLICK > Performing lemmatization at this stage may be appropriate for some applications, but in the main, nuanced speech leads to more accurate parsing results, which in turns leads to higher precision in extracting information of interest. Lemmatization is typically performed in the stage that follows parsing, the post processing stage. < CLICK >. Post processing is really an abstraction of many many many services – services that perform not only lemmatization (which is conceptually trivial), but semantic interpolation – the adding of additional meaning to the parse tree, as we saw on previous slides. < CLICK >
#34: However, at a high level, this is what happens. The input is pre-processed, parsed, and post-processed. < CLICK >
#35: Let’s add a little more context. The user provides input, the input is received, goes through the process we just talked about, and the insight (hopefully there is some) is provided back to the user. The important thing on this diagram is the “Intermediate Form”. How is the user input represented as it flows through this process? At its simplest, a data transfer object msut exist tha represents the initial input as a String, converts the String into an array of tokens, parses the tokens and stores the structured parse results, and has a mechanism for allowing the structurd output to be enhanced (or simplified) through a number of services, and finally for additional context to be applied and brought to bear upon these results. The design for intermediate representation lies at the heart of every parsing strategy. There are multiple strategies available today. These may vary by architecture, design principle or needs of the application. A parsing strategy that only leverages part of speech tagging is not likely to require a mechanism for storing deep parse results and the additional complexity this incurs. On the other hand, an architecture that can allow a parsing process the simplicity of a few steps, or the complexity of several hundred steps, and be customized without compromise to original design principles is of the most value. Of the many architectures that exist, there are yet many that are this well designed. Ultimiately the strategy you choose will be based on a variety of factors. I do identify this choice as being one of the the most important considerations in the parsing process.

�ݺ�ߣ

Deep Parsing (2012)

Recommended

More Related Content

More from Craig Trim (18)

Recently uploaded (20)

Deep Parsing (2012)

Editor's Notes