際際滷

際際滷Share a Scribd company logo
Using the Web as the Platform for
    Sharing and Consuming
        Biomedical Data


           Jun Zhao Ph.D.
        EPSRC Postdoctoral Fellow
          Department of Zoology
           University of Oxford
What is Linked Data?
2010 05 edinburgh
   Use URIs as names for things
   Use HTTP URIs so that people can look up
    those names.
   When someone looks up a URI, provide
    useful information.
   Include links to other URIs so that they can
    discover more things.
                              Tim Berners-Lee, July 2006
                 http://www.w3.org/DesignIssues/LinkedData.html
What is Linked Data?
 Search for the best public school in
Oxfordshire with tuition fee < 贈贈贈贈 and
    find out the houseprice in that
             neighborhood
2010 05 edinburgh
Nutshell of Linke Data (Technical Aspect)
Uniform Resource Identifer
   http://myedubase.db/school001, exEdu:school001
   URI to Linked Data is what URL to Web documents
   Use namespaces to guarantee the uniqueness
       http://myedubase.db/school001
             Namespace

       http://edubase.org/school001
   Use URIs as names for things
   Use HTTP URIs so that people can look up
    those names.
   When someone looks up a URI, provide
    useful information.
   Include links to other URIs so that they can
    discover more things.
                              Tim Berners-Lee, July 2006
                 http://www.w3.org/DesignIssues/LinkedData.html
Resource Description Framework
   A data model for the Semantic Web
   RDF is a graph-based data model
   Subject, Predicate, Object
   <ox:school001> <ox:is locataed in> <ox:Oxford>
   <ox:Oxford> <ox:is part of> <ox:Oxfordshire>
   <ox:school001> <ox:has national A level ranking> 20
          is located in     ox:Oxford                     is part of
                                                                       ox:Oxfordshire
                                          ox:Oxford
    ox:school001


                           has national A level ranking
            ox:school001                                       20
Now we are on the Web

Everything is identified by a dererenceable URI
Everybody uses the same URI

      is located in     ox:Oxford                     is part of
                                                                   ox:Oxfordshire
                                      ox:Oxford
ox:school001


                       has national A level ranking
        ox:school001                                       20
Everybody uses the same URI
      is located in    ox:Oxford
                        Oxford                       is part of
                                                                  ox:Oxfordshire
ox:school001


                      has national A level ranking
                                                          20
Data links
      is located in    my:Oxford

my:school001

                                                       is part of
                                                                    gov:Oxfordshire
                                           uk:oxford




                      has national A level ranking
         ox:oxfhigh                                    20
Data links
      is located in    my:Oxford

my:school001                         owl:sameAs

                                                       is part of
                                                                    gov:Oxfordshire
                                           uk:oxford



           owl:sameAs




                      has national A level ranking
         ox:oxfhigh                                    20
Linking Open Drug Data (LODD)
   A task force of the W3C Health Care Life Science Interest
    Group, started since October 2008
   Enrich the Web of Data by publishing drug-related and as
    Linked Data
   Investigate the benefits of Linked Data for drug discovery and
    biomedical research
   ~ 12 active participants, including researchers and pharmas
Dataset     Outgoing links
LinkedCT    220, 569
DrugBank    59, 661
DailyMed    38, 220
RDF-TCM     3, 438
Diseasome   31,065
SIDER       19, 281
Dataset                      Content                       Publishing tool     Triples
LinkedCT    Derived from ClinicalTrials.gov; more than      D2R Server         7,036, 000
            60,000 trials conducted in the US and other
            countries
DrugBank    Nearly 5,000 FDA-approved small molecule and    D2R Server         767,000
            biotech drugs
DailyMed    Published by Natonal Library of Medicine        D2R Server         164, 300
            (NLM); high quality packaging informaton on
            4,300 marketed drugs
RDF-TCM     850 herbs, herb-gene and herb-disease           Pubby              117, 600
            associatons
Diseasome   A network of disorders and disorder genes,      D2R Server         91, 200
            obtained from Online Mendelian Inheritance in
            Man (OMIM)
SIDER       Informaton on 930 marketed drugs and 1,700      D2R Server         192,500
            related side efects
                                                                               8, 400, 000
2010 05 edinburgh
Consuming linked data
Consume linked data
   Datawarehousing
       Convenient, but probamatic for keeping data sync.
   Query federation
       A query mediator to distribute queries to relevant
        data sources and then integrate the query results
       Less mature
       Based on SPARQL endpoints rather than the native
        Linked Data interfaces
Consuming linked data by following data links
Link traversal-based queries
   Combine link traversal with query execution
       Evaluate part of a query
       Look up URIs returned in the intermediate query
        results
   SQUIN.org
       A generic Linked Data query engine
       Can be used as a service: localhost:8080/SQUIN/
       By Olaf Hartig from Humboldt-Universit辰t zu Berlin
Find my school
 SELECT ?school ?ranking ?tuition ?price
 Where {
   <http://my/school1> geo:is_located_in ?loc;
   ?others      geo:is_located_in ?loc ;
               uk:national_A_level_ranking ?ranking ;
               edu:tuition      ?tuition .
   ?loc        house:average_house_price ?price .}

Find schools nearby an area, and their national A level
rankings, annual tuition fees and average house price in
that area.
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}
                               http
                                   :   //...
                                           /s c
                                               h
                                           ? ool001




   Queried data
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}




   Queried data
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}



                                     ........
                   <http://my/school1>
                       geo:is_located_in
                                <http://data.gov.uk/location/oxford>
   Queried data
                                     ........
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}
                           ?loc
                  http://.../oxford

                                           ........
                         <http://my/school1>
                             geo:is_located_in
                                      <http://data.gov.uk/location/oxford>
   Queried data
                                           ........
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}
                                      http
                           ?loc           :   //...
                                                    /ox
                                                ?      ford
                  http://.../oxford




   Queried data
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}
                                   http
                        ?loc           :   //...
                                                 /ox
                                             ?      ford
               http://.../oxford




Queried data
Find my school
SELECT ?school ?ranking ?tuition ?price
Where {
  <http://my/school1> geo:is_located_in ?loc;
  ?others      geo:is_located_in ?loc ;
              uk:national_A_level_ranking ?ranking ;
              edu:tuition      ?tuition .
  ?loc        house:average_house_price ?price .}
                        ?loc
               http://.../oxford




Queried data
Find my school
 SELECT ?school ?ranking ?tuition ?price
 Where {
   <http://my/school1> geo:is_located_in ?loc;
   ?others      geo:is_located_in ?loc ;
               uk:national_A_level_ranking ?ranking ;
               edu:tuition      ?tuition .
   ?loc        house:average_house_price ?price .}
                        ?loc
               http://.../oxford
                                        ........
                      <http://my/school100>
                          geo:is_located_in
                                   <http://data.gov.uk/location/oxford>

Queried data          <http://my/school101>
                          geo:is_located_in
                                   <http://data.gov.uk/location/oxford>
                                        ........
Find my school
 SELECT ?school ?ranking ?tuition ?price
 Where {
   <http://my/school1> geo:is_located_in ?loc;
   ?others      geo:is_located_in ?loc ;
               uk:national_A_level_ranking ?ranking ;
               edu:tuition      ?tuition .
   ?loc        house:average_house_price ?price .}
                        ?loc            ?others
               http://.../oxford   http://.../school100
               http://.../oxford   http://.../school101




Queried data
Consume LOD by link traversal
   Pros.
       No need to know all the data sources in advances
       Don't need SPARQL endpoitns for all datasets
       Access the most up-to-date data
   Cons.
       Performance could be a problem
       Requires a URI as the starting point, does not all
        kinds of queries
Search for alternative medicines
SELECT DISTINCT ?diseaseLabel ?altMedicineLabel
WHERE {
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01273>
      drugbank:possibleDiseaseTarget ?disease.
?disease owl:sameAs ?sameDisease.
?altMedicine tcm:treatment ?sameDisease.
?altMedicine rdf:type tcm:Medicine.
?sameDisease rdfs:label ?diseaseLabel.
?altMedicine rdfs:label ?altMedicineLabel.
}
What natural alternatives                 # of query results        7
 can be used instead of                 # of retrieved graphs       28
the drug Varenicline?                # of accessed servers        6
                                        avg. execution time     0min 46sec
Drug
Bank
Disea
 Drug            some
 Bank


                       Linked
Daily                    CT
med



        SIDER   RDF-
                TCM
2010 05 edinburgh
The Web of Data
   Allows distributed data publications
   Provides a scalable platform for data sharing
   Creates a web-scale data space
   Offers a new opportunity for data integration and
    aggregation
Open issues with LOD
   Best practices for data publication
   Changes of data
       Data synchronization
       Broken links
   Data linkings
       Mapping of individuals
       Mapping of concepts
   Provenance and trust
   Large-scale data access and reasoning
Acknowledgement
   W3C HCLS
   Olaf Hartig

More Related Content

2010 05 edinburgh

  • 1. Using the Web as the Platform for Sharing and Consuming Biomedical Data Jun Zhao Ph.D. EPSRC Postdoctoral Fellow Department of Zoology University of Oxford
  • 4. Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information. Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 http://www.w3.org/DesignIssues/LinkedData.html
  • 5. What is Linked Data? Search for the best public school in Oxfordshire with tuition fee < 贈贈贈贈 and find out the houseprice in that neighborhood
  • 7. Nutshell of Linke Data (Technical Aspect)
  • 8. Uniform Resource Identifer http://myedubase.db/school001, exEdu:school001 URI to Linked Data is what URL to Web documents Use namespaces to guarantee the uniqueness http://myedubase.db/school001 Namespace http://edubase.org/school001
  • 9. Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information. Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 http://www.w3.org/DesignIssues/LinkedData.html
  • 10. Resource Description Framework A data model for the Semantic Web RDF is a graph-based data model Subject, Predicate, Object <ox:school001> <ox:is locataed in> <ox:Oxford> <ox:Oxford> <ox:is part of> <ox:Oxfordshire> <ox:school001> <ox:has national A level ranking> 20 is located in ox:Oxford is part of ox:Oxfordshire ox:Oxford ox:school001 has national A level ranking ox:school001 20
  • 11. Now we are on the Web Everything is identified by a dererenceable URI
  • 12. Everybody uses the same URI is located in ox:Oxford is part of ox:Oxfordshire ox:Oxford ox:school001 has national A level ranking ox:school001 20
  • 13. Everybody uses the same URI is located in ox:Oxford Oxford is part of ox:Oxfordshire ox:school001 has national A level ranking 20
  • 14. Data links is located in my:Oxford my:school001 is part of gov:Oxfordshire uk:oxford has national A level ranking ox:oxfhigh 20
  • 15. Data links is located in my:Oxford my:school001 owl:sameAs is part of gov:Oxfordshire uk:oxford owl:sameAs has national A level ranking ox:oxfhigh 20
  • 16. Linking Open Drug Data (LODD) A task force of the W3C Health Care Life Science Interest Group, started since October 2008 Enrich the Web of Data by publishing drug-related and as Linked Data Investigate the benefits of Linked Data for drug discovery and biomedical research ~ 12 active participants, including researchers and pharmas
  • 17. Dataset Outgoing links LinkedCT 220, 569 DrugBank 59, 661 DailyMed 38, 220 RDF-TCM 3, 438 Diseasome 31,065 SIDER 19, 281
  • 18. Dataset Content Publishing tool Triples LinkedCT Derived from ClinicalTrials.gov; more than D2R Server 7,036, 000 60,000 trials conducted in the US and other countries DrugBank Nearly 5,000 FDA-approved small molecule and D2R Server 767,000 biotech drugs DailyMed Published by Natonal Library of Medicine D2R Server 164, 300 (NLM); high quality packaging informaton on 4,300 marketed drugs RDF-TCM 850 herbs, herb-gene and herb-disease Pubby 117, 600 associatons Diseasome A network of disorders and disorder genes, D2R Server 91, 200 obtained from Online Mendelian Inheritance in Man (OMIM) SIDER Informaton on 930 marketed drugs and 1,700 D2R Server 192,500 related side efects 8, 400, 000
  • 21. Consume linked data Datawarehousing Convenient, but probamatic for keeping data sync. Query federation A query mediator to distribute queries to relevant data sources and then integrate the query results Less mature Based on SPARQL endpoints rather than the native Linked Data interfaces
  • 22. Consuming linked data by following data links
  • 23. Link traversal-based queries Combine link traversal with query execution Evaluate part of a query Look up URIs returned in the intermediate query results SQUIN.org A generic Linked Data query engine Can be used as a service: localhost:8080/SQUIN/ By Olaf Hartig from Humboldt-Universit辰t zu Berlin
  • 24. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} Find schools nearby an area, and their national A level rankings, annual tuition fees and average house price in that area.
  • 25. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} http : //... /s c h ? ool001 Queried data
  • 26. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} Queried data
  • 27. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} ........ <http://my/school1> geo:is_located_in <http://data.gov.uk/location/oxford> Queried data ........
  • 28. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} ?loc http://.../oxford ........ <http://my/school1> geo:is_located_in <http://data.gov.uk/location/oxford> Queried data ........
  • 29. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} http ?loc : //... /ox ? ford http://.../oxford Queried data
  • 30. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} http ?loc : //... /ox ? ford http://.../oxford Queried data
  • 31. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} ?loc http://.../oxford Queried data
  • 32. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} ?loc http://.../oxford ........ <http://my/school100> geo:is_located_in <http://data.gov.uk/location/oxford> Queried data <http://my/school101> geo:is_located_in <http://data.gov.uk/location/oxford> ........
  • 33. Find my school SELECT ?school ?ranking ?tuition ?price Where { <http://my/school1> geo:is_located_in ?loc; ?others geo:is_located_in ?loc ; uk:national_A_level_ranking ?ranking ; edu:tuition ?tuition . ?loc house:average_house_price ?price .} ?loc ?others http://.../oxford http://.../school100 http://.../oxford http://.../school101 Queried data
  • 34. Consume LOD by link traversal Pros. No need to know all the data sources in advances Don't need SPARQL endpoitns for all datasets Access the most up-to-date data Cons. Performance could be a problem Requires a URI as the starting point, does not all kinds of queries
  • 35. Search for alternative medicines SELECT DISTINCT ?diseaseLabel ?altMedicineLabel WHERE { <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01273> drugbank:possibleDiseaseTarget ?disease. ?disease owl:sameAs ?sameDisease. ?altMedicine tcm:treatment ?sameDisease. ?altMedicine rdf:type tcm:Medicine. ?sameDisease rdfs:label ?diseaseLabel. ?altMedicine rdfs:label ?altMedicineLabel. } What natural alternatives # of query results 7 can be used instead of # of retrieved graphs 28 the drug Varenicline? # of accessed servers 6 avg. execution time 0min 46sec
  • 37. Disea Drug some Bank Linked Daily CT med SIDER RDF- TCM
  • 39. The Web of Data Allows distributed data publications Provides a scalable platform for data sharing Creates a web-scale data space Offers a new opportunity for data integration and aggregation
  • 40. Open issues with LOD Best practices for data publication Changes of data Data synchronization Broken links Data linkings Mapping of individuals Mapping of concepts Provenance and trust Large-scale data access and reasoning
  • 41. Acknowledgement W3C HCLS Olaf Hartig