Lumify is an open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing both Hadoop and Storm, it ingests and integrates virtually any kind of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation.
Charlie Greenbacker, Director of Data Science at Altamira, will provide an overview of Lumify and discuss how natural language processing (NLP) tools are used to enrich the text content of ingested data and automatically discover connections with other bits of information. Joe Ferner, Senior Software Engineer at Altamira, will describe the creation of SecureGraph and how it supports authorizations, visibility strings, multivalued properties, and property metadata in a graph database.
1 of 27
Downloaded 207 times
More Related Content
Natural Language Processing and Graph Databases in Lumify
1. NLP and Graph Databases in
Charlie Greenbacker & Joe Kerner
6. is an open source
big data analysis and
visualization platform
built by Altamira engineers
7. Key Lumify Concepts
structure for organizing information (i.e., your data model)
Ontology
any thing you want to represent (e.g., person, place, event)
Entities
a link between two entities (e.g., leader-of, works-for, sibling-of)
Relationships
data about an entity (e.g., first name, last name, date of birth)
Properties
collection of entities and the relationships between them
Graph
10. Lumify helps analysts
fuse structured and
unstructured data
from myriad sources
into actionable
intelligence.
Intelligence
Analyst
11. Law enforcement
personnel can use
Lumify to explore
criminal networks,
uncover hidden
connections, and
develop leads.
Police
Investigator
12. Lumify analyzes
financial data and
transaction records
to help detect fraud
and identify possible
insider threats.
Financial
Analyst
photo:&Ken&Teegardin&(h9ps://鍖ic.kr/p/9rn9Yh)&
13. Scientists, law firms,
news organizations,
and others can
track their research
in Lumify to unearth
latent knowledge
and discover critical
new insights.
Research
Staff
photo:&UK&NaConal&Archives&(h9p://bit.ly/1n9dhR8)&
15. ≒ Distributed under the
permissive Apache 2.0
license
≒ No restrictions on
modifications
≒ No licensing or usage
constraints
Free and
Open Source
16. Built on Scalable Open Source Tech
Hadoop&CDH&4&
Accumulo&
ElasCcSearch&
tesseract&CLAVIN& CMU&Sphinx&OpenNLP& OpenCV& 鍖mpeg&
Apache&Storm&
Secure&Graph&
custom&code&
17. ≒ Separate security
restrictions at the
entity, property, and
relationship level
≒ Implemented in and
enforced by
Accumulo cell-level
security
Highly Secure
Joaquin Guzman Loera
DOB: 1957-04-04
POB: Badiraguarto
Nationality: Mexican
Founded: 2010-01-11
Location: Mexico City
Employees: 121
Zarka de Mexico
19. ≒ Day-to-day
development done on
Amazon infrastructure
≒ Primarily use EC2, VPC,
S3, SES, CloudWatch
≒ Altamira is an AWS
consulting partner
AWS
Compatible
22. Text Enrichment
≒ Apache OpenNLP
≒ Named Entity Recognition
≒ Extracts names of entities
from unstructured text
≒ Persons, Orgs, & Locations
≒ Highlighted in preview text
≒ User must confirm/resolve
≒ CLAVIN
≒ Geospatial Entity Resolution
≒ Resolves extracted location
names to gazetteer records
≒ Solves Springfield problem
≒ Disambiguates place names
≒ Turns text docs into maps!
23. Machine-powered entity
extraction and resolution,
combined with human QA
and supplementation,
supports rich semantic
analysis of raw text.
Enriched
Text
Documents
Drug Lord El Chapo Captured in Mexico
PUBLISHED DATE
SOURCE
Audit
2014/02/22
Wikipedia
Add Property
Although Guzman had long hidden successfully in remote areas of the
Sierra Madre mountains, the arrested members of his security team told
the military he had begun venturing out to Culiacan and the beach town of
Mazatlan. A week prior to his capture, Guzman and Zambada were
reported to have attended a family reunion in Sinaloa. The Mexican military
followed the bodyguards tips to Guzmans ex-wifes house, but they had
trouble ramming the steel-reinforced front door, which allowed Guzman to
escape through a system of secret tunnels that connected six houses,
eventually moving south to Mazatlan. He planned to stay a few days in
Mazatlan to see his twin baby daughters before retreating to the
mountains.
On 22 February 2014, at around 6:40 a.m., Mexican authorities arrested
Guzman at a hotel in a beach front area in Mazatlan, Sinaloa, following an
operation by the Mexican Navy, with joint intelligence from the DEA and
24. Benefits to Users
quickly find relevant data without reading
Increases Discoverability
machines process text faster than humans
Helps Deal with Information Overload
enables object-based analysis & investigations
Uncovers Hidden Connections
25. Future NLP Integration
e.g., Stanford NER, SUTime, MITIE
Support other NER tools
e.g., OpenIE (formerly ReVerb)
Event/Relationship Extraction
augmenting/extending GATE/ANNIE
Coreference Resolution
e.g., frequency analysis, topic modeling, sentiment analysis
Additional Text Analytics
use non-English language models for NER, etc.
Multilingual Support
26. Graph Databases in
view part 2 of the presentation here:
github.com/altamiracorp/secure-graph-presentation