This document summarizes a presentation about DisGeNET, a knowledge platform on human diseases and their genes. The presentation discusses four key points: 1) Fragmentation of information is a barrier to knowledge about disease mechanisms. 2) The high rate of data generation on gene-disease associations poses challenges for curation pipelines. 3) Data prioritization is needed to aid interpretation of genetic determinants of disease. 4) A large number of genes associated with some diseases may reflect phenotypic diversity.
1 of 26
Download to read offline
More Related Content
Laura Furlong. Big Data in Biomedicine debate. Barcelona, Nov 11 2014
12. ? Nearly
70
%
of
GDAs
supported
by
only
one
publicaHon!
? 900
GDAs
supported
by
>
200
publicaHons
? Average
2.8
publicaHons/GDA
11/11/14
Laura
I.
Furlong
12
15. ? This
is
not
raw
data
(e.g.
from
NGS
analysis),
is
data
already
collated
and
filtered
that
have
passed
peer
review
before
publicaHon
? Text
mined
from
abstracts:
? Not
mining
full
text,
nor
supplementary
material
? Not
mining
tables,
figures
? Focus
on
relaHons
stated
on
sentences,
not
handling
anaphoras
? We
are
just
looking
into
a
small
frac)on
of
all
the
available
data!!!
11/11/14
Laura
I.
Furlong
15
16. Key
points
2. High
rate
of
data
generaPon
on
GDAs
poses
challenges
to
biocuraPon
pipelines
? Need
to
find
alternaHve
strategies
to
expert
curaHon
? ¡°wisdom
of
the
crowds¡±
approaches
(e.g.
crowdsourcing)
? ?
11/11/14
Laura
I.
Furlong
16
17. Key
points
3. Data
prioriPzaPon
to
support
interpretaPon
of
data
on
the
genePc
determinants
of
human
diseases
11/11/14
Laura
I.
Furlong
17
25. Key
points
1. FragmentaHon
of
informaHon
as
a
barrier
to
knowledge
on
the
mechanisms
of
human
diseases
2. High
rate
of
data
generaHon
on
GDAs
poses
challenges
to
biocuraHon
pipelines
3. Data
prioriHzaHon
to
support
interpretaHon
of
data
on
the
geneHc
determinants
of
human
diseases
4. Large
number
of
genes
associated
to
some
diseases
might
reflect
phenotypic
diversity