�ݺ�ߣ

DATANOTE
DOCUMENT-ENTITY EXPLORATION PLATFORM

Datanote is a desktop app to extract and
visualize relationships between entities
cited in .pdf, .doc etc.. documents but
also other sources such as databases or
web pages.

"I’m a product
manager. I like
gardening, cinema
and sky diving."

"Cutaneous
administration of
unicornamycin might
transform the subject
into a unicorn."

Named entities being
unique, they are a versatile
metric to establish the nature
of documents but also the
recurring patterns of these
entities themselves.

Fed with the right input our
visual cortex can become a
powerful analysis system.
Graph visualization helps us
perceive relationships
between entities from the
micro to the macro level.
To exploit this effect we
project entities onto a 2D
plane using co-citation scores
as a distance metric.

But different kinds of questions may
impose different ways to interact with
and explore the knowledge graph.
For this reason multiple interfaces are
being developed in Datanote.

Warning: working prototype,
UI subject to change.

Your Industry
What A are linked with B?
What if you built your own
extraction model using
your company data?
Human Resources
What entities are associated
with a candidate ? Or a
school, a company, a skill?
Market Intelligence
What terms are mentioned
with my brand? my
competitor? And in their
job offers?
Fraud detection
Who is mentioned in some
PDF reports? What are the
links between accounts or
phone numbers?

Datanote is a stand-alone application
written in Electron.
In the prototype text extraction is
performed locally using Node
modules and data is stored in an
embedded OrientDB database.

Datanote is designed with i18n
support in mind: not only for
display but also for entity
identification.
This approach makes it possible
to process mixed-language
sources such as web pages and
social media.
This design also allows humans
to improve the model by adding
new words.

For now Datanote uses
its own datasets some
which are open-source
at github.com/datagica
Support for external
data sources and
models will probably
be asked.. and so is
planned.

Using pre-defined lists of
words works for certain
cases but what about
unknown data?
For complex entities
Datanote uses pattern
matching models to
recognise human names,
phones and IBAN numbers,
addresses, emails, spoken
languages..

Smarter extraction models? Our own
machine-learned models?
Complex knowledge graph
interrogation in Gremlin or natural
language?
Allow people to use their own "better"
models? Third party API cloud models?

Support other DBs for data storage? Full
featured search system? Chatbot API? Slack
integration?
A model or datasource plugin marketplace?
With a commission system for us?
Jupyter extension for datascientists? Web
platform to publish read-only notebooks?
What should be done ﬁrst?

On a more personal note:
project status

Datanote is a side project with no funding and
thus is progressing rather slowly, stopping at
times.
As I do not wish to see it disappear I am in the
process of open-sourcing it bit by bit.
But maybe it could be monetized? What would
be the market and the business model then?
That is still an open question.

julian.bilcke@datagica.com
github.com/datagica

�ݺ�ߣ

Datanote

More Related Content

Datanote