Invited presentation on using the Web as the platform for sharing and consuming biomedical data at Edinburgh May 12, 2010
1 of 41
More Related Content
2010 05 edinburgh
1. Using the Web as the Platform for
Sharing and Consuming
Biomedical Data
Jun Zhao Ph.D.
EPSRC Postdoctoral Fellow
Department of Zoology
University of Oxford
4. Use URIs as names for things
Use HTTP URIs so that people can look up
those names.
When someone looks up a URI, provide
useful information.
Include links to other URIs so that they can
discover more things.
Tim Berners-Lee, July 2006
http://www.w3.org/DesignIssues/LinkedData.html
5. What is Linked Data?
Search for the best public school in
Oxfordshire with tuition fee < 贈贈贈贈 and
find out the houseprice in that
neighborhood
8. Uniform Resource Identifer
http://myedubase.db/school001, exEdu:school001
URI to Linked Data is what URL to Web documents
Use namespaces to guarantee the uniqueness
http://myedubase.db/school001
Namespace
http://edubase.org/school001
9. Use URIs as names for things
Use HTTP URIs so that people can look up
those names.
When someone looks up a URI, provide
useful information.
Include links to other URIs so that they can
discover more things.
Tim Berners-Lee, July 2006
http://www.w3.org/DesignIssues/LinkedData.html
10. Resource Description Framework
A data model for the Semantic Web
RDF is a graph-based data model
Subject, Predicate, Object
<ox:school001> <ox:is locataed in> <ox:Oxford>
<ox:Oxford> <ox:is part of> <ox:Oxfordshire>
<ox:school001> <ox:has national A level ranking> 20
is located in ox:Oxford is part of
ox:Oxfordshire
ox:Oxford
ox:school001
has national A level ranking
ox:school001 20
11. Now we are on the Web
Everything is identified by a dererenceable URI
12. Everybody uses the same URI
is located in ox:Oxford is part of
ox:Oxfordshire
ox:Oxford
ox:school001
has national A level ranking
ox:school001 20
13. Everybody uses the same URI
is located in ox:Oxford
Oxford is part of
ox:Oxfordshire
ox:school001
has national A level ranking
20
14. Data links
is located in my:Oxford
my:school001
is part of
gov:Oxfordshire
uk:oxford
has national A level ranking
ox:oxfhigh 20
15. Data links
is located in my:Oxford
my:school001 owl:sameAs
is part of
gov:Oxfordshire
uk:oxford
owl:sameAs
has national A level ranking
ox:oxfhigh 20
16. Linking Open Drug Data (LODD)
A task force of the W3C Health Care Life Science Interest
Group, started since October 2008
Enrich the Web of Data by publishing drug-related and as
Linked Data
Investigate the benefits of Linked Data for drug discovery and
biomedical research
~ 12 active participants, including researchers and pharmas
18. Dataset Content Publishing tool Triples
LinkedCT Derived from ClinicalTrials.gov; more than D2R Server 7,036, 000
60,000 trials conducted in the US and other
countries
DrugBank Nearly 5,000 FDA-approved small molecule and D2R Server 767,000
biotech drugs
DailyMed Published by Natonal Library of Medicine D2R Server 164, 300
(NLM); high quality packaging informaton on
4,300 marketed drugs
RDF-TCM 850 herbs, herb-gene and herb-disease Pubby 117, 600
associatons
Diseasome A network of disorders and disorder genes, D2R Server 91, 200
obtained from Online Mendelian Inheritance in
Man (OMIM)
SIDER Informaton on 930 marketed drugs and 1,700 D2R Server 192,500
related side efects
8, 400, 000
21. Consume linked data
Datawarehousing
Convenient, but probamatic for keeping data sync.
Query federation
A query mediator to distribute queries to relevant
data sources and then integrate the query results
Less mature
Based on SPARQL endpoints rather than the native
Linked Data interfaces
23. Link traversal-based queries
Combine link traversal with query execution
Evaluate part of a query
Look up URIs returned in the intermediate query
results
SQUIN.org
A generic Linked Data query engine
Can be used as a service: localhost:8080/SQUIN/
By Olaf Hartig from Humboldt-Universit辰t zu Berlin
24. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
Find schools nearby an area, and their national A level
rankings, annual tuition fees and average house price in
that area.
25. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
http
: //...
/s c
h
? ool001
Queried data
26. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
Queried data
27. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
........
<http://my/school1>
geo:is_located_in
<http://data.gov.uk/location/oxford>
Queried data
........
28. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
?loc
http://.../oxford
........
<http://my/school1>
geo:is_located_in
<http://data.gov.uk/location/oxford>
Queried data
........
29. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
http
?loc : //...
/ox
? ford
http://.../oxford
Queried data
30. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
http
?loc : //...
/ox
? ford
http://.../oxford
Queried data
31. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
?loc
http://.../oxford
Queried data
32. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
?loc
http://.../oxford
........
<http://my/school100>
geo:is_located_in
<http://data.gov.uk/location/oxford>
Queried data <http://my/school101>
geo:is_located_in
<http://data.gov.uk/location/oxford>
........
33. Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
<http://my/school1> geo:is_located_in ?loc;
?others geo:is_located_in ?loc ;
uk:national_A_level_ranking ?ranking ;
edu:tuition ?tuition .
?loc house:average_house_price ?price .}
?loc ?others
http://.../oxford http://.../school100
http://.../oxford http://.../school101
Queried data
34. Consume LOD by link traversal
Pros.
No need to know all the data sources in advances
Don't need SPARQL endpoitns for all datasets
Access the most up-to-date data
Cons.
Performance could be a problem
Requires a URI as the starting point, does not all
kinds of queries
35. Search for alternative medicines
SELECT DISTINCT ?diseaseLabel ?altMedicineLabel
WHERE {
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01273>
drugbank:possibleDiseaseTarget ?disease.
?disease owl:sameAs ?sameDisease.
?altMedicine tcm:treatment ?sameDisease.
?altMedicine rdf:type tcm:Medicine.
?sameDisease rdfs:label ?diseaseLabel.
?altMedicine rdfs:label ?altMedicineLabel.
}
What natural alternatives # of query results 7
can be used instead of # of retrieved graphs 28
the drug Varenicline? # of accessed servers 6
avg. execution time 0min 46sec
37. Disea
Drug some
Bank
Linked
Daily CT
med
SIDER RDF-
TCM
39. The Web of Data
Allows distributed data publications
Provides a scalable platform for data sharing
Creates a web-scale data space
Offers a new opportunity for data integration and
aggregation
40. Open issues with LOD
Best practices for data publication
Changes of data
Data synchronization
Broken links
Data linkings
Mapping of individuals
Mapping of concepts
Provenance and trust
Large-scale data access and reasoning