際際滷

際際滷Share a Scribd company logo
NLP and Graph Databases in
Charlie Greenbacker & Joe Kerner
Agenda
Graph Databases
Lumify Overview
Introductions
Natural Language Processing
photo:&Columbia&Pictures&
About me: @greenbacker
Theories: popular tripe
Methods: sloppy
Conclusions: highly questionable
Best reason for
not finishing PhD
@ExploreAltamira
is an open source
big data analysis and
visualization platform
built by Altamira engineers
Key Lumify Concepts
structure for organizing information (i.e., your data model)
Ontology
any thing you want to represent (e.g., person, place, event)
Entities
a link between two entities (e.g., leader-of, works-for, sibling-of)
Relationships
data about an entity (e.g., first name, last name, date of birth)
Properties
collection of entities and the relationships between them
Graph
Live Demo
Who can Lumify help?
Lumify helps analysts
fuse structured and
unstructured data
from myriad sources
into actionable
intelligence.
Intelligence
Analyst
Law enforcement
personnel can use
Lumify to explore
criminal networks,
uncover hidden
connections, and
develop leads.
Police
Investigator
Lumify analyzes
financial data and
transaction records
to help detect fraud
and identify possible
insider threats.
Financial
Analyst
photo:&Ken&Teegardin&(h9ps://鍖ic.kr/p/9rn9Yh)&
Scientists, law firms,
news organizations,
and others can
track their research
in Lumify to unearth
latent knowledge
and discover critical
new insights.
Research
Staff
photo:&UK&NaConal&Archives&(h9p://bit.ly/1n9dhR8)&
Why Lumify?
≒ Distributed under the
permissive Apache 2.0
license
≒ No restrictions on
modifications
≒ No licensing or usage
constraints
Free and
Open Source
Built on Scalable Open Source Tech
Hadoop&CDH&4&
Accumulo&
ElasCcSearch&
tesseract&CLAVIN& CMU&Sphinx&OpenNLP& OpenCV& 鍖mpeg&
Apache&Storm&
Secure&Graph&
custom&code&
≒ Separate security
restrictions at the
entity, property, and
relationship level
≒ Implemented in and
enforced by
Accumulo cell-level
security
Highly Secure
Joaquin Guzman Loera
DOB: 1957-04-04
POB: Badiraguarto
Nationality: Mexican
Founded: 2010-01-11
Location: Mexico City
Employees: 121
Zarka de Mexico
≒ Full-time development
staff
≒ Custom development
and customization
services
≒ Commercial support
offerings
Supported
≒ Day-to-day
development done on
Amazon infrastructure
≒ Primarily use EC2, VPC,
S3, SES, CloudWatch
≒ Altamira is an AWS
consulting partner
AWS
Compatible
Natural Language Processing in
Text Extraction
video
text docs
structured
data
images OCR
tesseract
audio CMU
Sphinx
CMU
Sphinx
OCR
tesseract
extractor
Text Enrichment
≒ Apache OpenNLP
≒ Named Entity Recognition
≒ Extracts names of entities
from unstructured text
≒ Persons, Orgs, & Locations
≒ Highlighted in preview text
≒ User must confirm/resolve
≒ CLAVIN
≒ Geospatial Entity Resolution
≒ Resolves extracted location
names to gazetteer records
≒ Solves Springfield problem
≒ Disambiguates place names
≒ Turns text docs into maps!
Machine-powered entity
extraction and resolution,
combined with human QA
and supplementation,
supports rich semantic
analysis of raw text.
Enriched
Text
Documents
Drug Lord El Chapo Captured in Mexico
PUBLISHED DATE
SOURCE
Audit
2014/02/22
Wikipedia
Add Property
Although Guzman had long hidden successfully in remote areas of the
Sierra Madre mountains, the arrested members of his security team told
the military he had begun venturing out to Culiacan and the beach town of
Mazatlan. A week prior to his capture, Guzman and Zambada were
reported to have attended a family reunion in Sinaloa. The Mexican military
followed the bodyguards tips to Guzmans ex-wifes house, but they had
trouble ramming the steel-reinforced front door, which allowed Guzman to
escape through a system of secret tunnels that connected six houses,
eventually moving south to Mazatlan. He planned to stay a few days in
Mazatlan to see his twin baby daughters before retreating to the
mountains.

On 22 February 2014, at around 6:40 a.m., Mexican authorities arrested
Guzman at a hotel in a beach front area in Mazatlan, Sinaloa, following an
operation by the Mexican Navy, with joint intelligence from the DEA and
Benefits to Users
quickly find relevant data without reading
Increases Discoverability
machines process text faster than humans
Helps Deal with Information Overload
enables object-based analysis & investigations
Uncovers Hidden Connections
Future NLP Integration
e.g., Stanford NER, SUTime, MITIE
Support other NER tools
e.g., OpenIE (formerly ReVerb)
Event/Relationship Extraction
augmenting/extending GATE/ANNIE
Coreference Resolution
e.g., frequency analysis, topic modeling, sentiment analysis
Additional Text Analytics
use non-English language models for NER, etc.
Multilingual Support
Graph Databases in
view part 2 of the presentation here:
github.com/altamiracorp/secure-graph-presentation
Questions?
more info: lumify.io

More Related Content

Natural Language Processing and Graph Databases in Lumify

  • 1. NLP and Graph Databases in Charlie Greenbacker & Joe Kerner
  • 3. photo:&Columbia&Pictures& About me: @greenbacker Theories: popular tripe Methods: sloppy Conclusions: highly questionable
  • 4. Best reason for not finishing PhD
  • 6. is an open source big data analysis and visualization platform built by Altamira engineers
  • 7. Key Lumify Concepts structure for organizing information (i.e., your data model) Ontology any thing you want to represent (e.g., person, place, event) Entities a link between two entities (e.g., leader-of, works-for, sibling-of) Relationships data about an entity (e.g., first name, last name, date of birth) Properties collection of entities and the relationships between them Graph
  • 10. Lumify helps analysts fuse structured and unstructured data from myriad sources into actionable intelligence. Intelligence Analyst
  • 11. Law enforcement personnel can use Lumify to explore criminal networks, uncover hidden connections, and develop leads. Police Investigator
  • 12. Lumify analyzes financial data and transaction records to help detect fraud and identify possible insider threats. Financial Analyst photo:&Ken&Teegardin&(h9ps://鍖ic.kr/p/9rn9Yh)&
  • 13. Scientists, law firms, news organizations, and others can track their research in Lumify to unearth latent knowledge and discover critical new insights. Research Staff photo:&UK&NaConal&Archives&(h9p://bit.ly/1n9dhR8)&
  • 15. ≒ Distributed under the permissive Apache 2.0 license ≒ No restrictions on modifications ≒ No licensing or usage constraints Free and Open Source
  • 16. Built on Scalable Open Source Tech Hadoop&CDH&4& Accumulo& ElasCcSearch& tesseract&CLAVIN& CMU&Sphinx&OpenNLP& OpenCV& 鍖mpeg& Apache&Storm& Secure&Graph& custom&code&
  • 17. ≒ Separate security restrictions at the entity, property, and relationship level ≒ Implemented in and enforced by Accumulo cell-level security Highly Secure Joaquin Guzman Loera DOB: 1957-04-04 POB: Badiraguarto Nationality: Mexican Founded: 2010-01-11 Location: Mexico City Employees: 121 Zarka de Mexico
  • 18. ≒ Full-time development staff ≒ Custom development and customization services ≒ Commercial support offerings Supported
  • 19. ≒ Day-to-day development done on Amazon infrastructure ≒ Primarily use EC2, VPC, S3, SES, CloudWatch ≒ Altamira is an AWS consulting partner AWS Compatible
  • 21. Text Extraction video text docs structured data images OCR tesseract audio CMU Sphinx CMU Sphinx OCR tesseract extractor
  • 22. Text Enrichment ≒ Apache OpenNLP ≒ Named Entity Recognition ≒ Extracts names of entities from unstructured text ≒ Persons, Orgs, & Locations ≒ Highlighted in preview text ≒ User must confirm/resolve ≒ CLAVIN ≒ Geospatial Entity Resolution ≒ Resolves extracted location names to gazetteer records ≒ Solves Springfield problem ≒ Disambiguates place names ≒ Turns text docs into maps!
  • 23. Machine-powered entity extraction and resolution, combined with human QA and supplementation, supports rich semantic analysis of raw text. Enriched Text Documents Drug Lord El Chapo Captured in Mexico PUBLISHED DATE SOURCE Audit 2014/02/22 Wikipedia Add Property Although Guzman had long hidden successfully in remote areas of the Sierra Madre mountains, the arrested members of his security team told the military he had begun venturing out to Culiacan and the beach town of Mazatlan. A week prior to his capture, Guzman and Zambada were reported to have attended a family reunion in Sinaloa. The Mexican military followed the bodyguards tips to Guzmans ex-wifes house, but they had trouble ramming the steel-reinforced front door, which allowed Guzman to escape through a system of secret tunnels that connected six houses, eventually moving south to Mazatlan. He planned to stay a few days in Mazatlan to see his twin baby daughters before retreating to the mountains. On 22 February 2014, at around 6:40 a.m., Mexican authorities arrested Guzman at a hotel in a beach front area in Mazatlan, Sinaloa, following an operation by the Mexican Navy, with joint intelligence from the DEA and
  • 24. Benefits to Users quickly find relevant data without reading Increases Discoverability machines process text faster than humans Helps Deal with Information Overload enables object-based analysis & investigations Uncovers Hidden Connections
  • 25. Future NLP Integration e.g., Stanford NER, SUTime, MITIE Support other NER tools e.g., OpenIE (formerly ReVerb) Event/Relationship Extraction augmenting/extending GATE/ANNIE Coreference Resolution e.g., frequency analysis, topic modeling, sentiment analysis Additional Text Analytics use non-English language models for NER, etc. Multilingual Support
  • 26. Graph Databases in view part 2 of the presentation here: github.com/altamiracorp/secure-graph-presentation