際際滷

際際滷Share a Scribd company logo
MCW Driving Biological Project
                             Simon Twigger, PhD




                                     1
Monday, September 27, 2010
Rat Genome Database




                             2
Monday, September 27, 2010
Whats the problem?
         large scale repositories
          with unused or
          inaccessible information

         How can these
          databases be made
          more useful?

         How to help researchers
          鍖nd and use this
          information to connect
          genes to disease?



                                     3
Monday, September 27, 2010
Rat researchers ask...

                                   What tissue is this gene expressed in?
                 What expression data is Are any of these genes
               known for SD (aka SD/NHsd,
                 Harlan Sprague Dawley,       associated with my
                  Sprague Dawley) rats?            phenotype?
                         Has this gene been seen in the brain?
                       What rat expression studies have been done on
                       Mammary Cancer(aka breast neoplasms/breast
                      cancer/cancer of the breast, breast carcinoma...)?
Monday, September 27, 2010
What's the strategy?
         Focus on GEO
                                   GEO Records


          (microarray)                            Create Annotation
                                                  Jobs & Queue Up

                                                                        Q-Out

         Use NCBO annotator
                                                                                                     1..n Annot. Workers


          to markup text,                                             RabbitMQ                           Index text
          review annotations                                                                               at OBA


          and then use for tools                                       Q-In
                                                                                                          Parse
                                                                                                          Results
          and visualization
                                                 Results saved to                Put results in to
                                                 GMiner database                 queue for save

         Combine annotations
          with biological data
          to derive new
          insights.



                                             5
Monday, September 27, 2010
Current Ontologies




     http://bioportal.bioontology.org/
Monday, September 27, 2010
7
Monday, September 27, 2010
8
Monday, September 27, 2010
Progress




Monday, September 27, 2010
Linking annotations to data




                Tm2d1
    RGD1306410
                     Svs4
                     Hbb
              Scgb2a1
                       Alb
Monday, September 27, 2010
Linking annotations to data
            Tm2d1
   RGD1306410
                Svs4
                Hbb
           Scgb2a1
                                                     +
                 Alb




                             Hbb   is_expressed_in rat kidney
                             Tm2d1 is_expressed_in rat kidney

                 Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
                 (U34a/b/c, 230, 230v2)
                 62,000 samples x ca. 25,000 genes/sample = 1.5B data points
Monday, September 27, 2010
Probeset results on GMiner

                                  Probeset L08490cds_at for
                                 Gabra1 - gamma-aminobutyric
                                 acid (GABA) A receptor, alpha 1




    Hs GABRA1
Monday, September 27, 2010
QTL
          Hypertensive

                                                                        G      G     G


                                        Phenotype
                                                              Pathway         Strain 1   !=   Strain 2


                                                    G
                                                                   Anatomy
                                                        G
                                                                   (Kidney)
                                        Component
                                            Function
                                                    Process


                             Hypertension

Monday, September 27, 2010
QTL Gene Highlighter




                             QTL

                         G    G    G




                                                  AllegroGraph

                Disease/Pheno.

                                         GMiner   RGD    OBO     etc

Monday, September 27, 2010
RDF/OWL sources
         Cell Ontology
         http://www.berkeleybop.org/ontologies/obo-all/cell/cell.owl

         Mouse Adult Gross Anatomy
         http://www.berkeleybop.org/ontologies/obo-all/adult_mouse_anatomy/
         adult_mouse_anatomy.owl

         Mammalian Phenotype
         http://www.berkeleybop.org/ontologies/obo-all/mammalian_phenotype/
         mammalian_phenotype.owl

         GO Function
         http://www.berkeleybop.org/ontologies/obo-all/molecular_function/molecular_function.owl

         GO Process
         http://www.berkeleybop.org/ontologies/obo-all/biological_process/biological_process.owl

         GO component
         http://www.berkeleybop.org/ontologies/obo-all/cellular_component/cellular_component.owl




Monday, September 27, 2010
Rat Genome Database




      Wide variety of data types - genomic and physiological
      many with corresponding ontologies


                               16
Monday, September 27, 2010
Monday, September 27, 2010
RGD->RDF




                             Existing RGD object types &
                                    mappings to SO


Monday, September 27, 2010
RGD Gene




Monday, September 27, 2010
RGD QTL




Monday, September 27, 2010
QTL Highlighter




                   Rails source code will be available on GitHub
                   RDFizer (ruby) http://github.com/simont/MCW-RDF
Monday, September 27, 2010
Next Steps
        Register PURL for RGD

        Create RGD core object ontology (OWL/RDF)

        Select appropriate URIs for RGD data

        Ontology annotations - how best to represent in triple store?



        Export GMiner data to RDF-> Triple Store

        Document & re鍖ne biological use cases related to candidate gene selection/evaluation

        Identify additional data required for candidate gene selection, RDFize as appropriate,
         load into triple store.

        Connections to other RDF collections/LOD, etc.?




Monday, September 27, 2010

More Related Content

NCBO DBP

  • 1. MCW Driving Biological Project Simon Twigger, PhD 1 Monday, September 27, 2010
  • 2. Rat Genome Database 2 Monday, September 27, 2010
  • 3. Whats the problem? large scale repositories with unused or inaccessible information How can these databases be made more useful? How to help researchers 鍖nd and use this information to connect genes to disease? 3 Monday, September 27, 2010
  • 4. Rat researchers ask... What tissue is this gene expressed in? What expression data is Are any of these genes known for SD (aka SD/NHsd, Harlan Sprague Dawley, associated with my Sprague Dawley) rats? phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)? Monday, September 27, 2010
  • 5. What's the strategy? Focus on GEO GEO Records (microarray) Create Annotation Jobs & Queue Up Q-Out Use NCBO annotator 1..n Annot. Workers to markup text, RabbitMQ Index text review annotations at OBA and then use for tools Q-In Parse Results and visualization Results saved to Put results in to GMiner database queue for save Combine annotations with biological data to derive new insights. 5 Monday, September 27, 2010
  • 6. Current Ontologies http://bioportal.bioontology.org/ Monday, September 27, 2010
  • 10. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb Monday, September 27, 2010
  • 11. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Monday, September 27, 2010
  • 12. Probeset results on GMiner Probeset L08490cds_at for Gabra1 - gamma-aminobutyric acid (GABA) A receptor, alpha 1 Hs GABRA1 Monday, September 27, 2010
  • 13. QTL Hypertensive G G G Phenotype Pathway Strain 1 != Strain 2 G Anatomy G (Kidney) Component Function Process Hypertension Monday, September 27, 2010
  • 14. QTL Gene Highlighter QTL G G G AllegroGraph Disease/Pheno. GMiner RGD OBO etc Monday, September 27, 2010
  • 15. RDF/OWL sources Cell Ontology http://www.berkeleybop.org/ontologies/obo-all/cell/cell.owl Mouse Adult Gross Anatomy http://www.berkeleybop.org/ontologies/obo-all/adult_mouse_anatomy/ adult_mouse_anatomy.owl Mammalian Phenotype http://www.berkeleybop.org/ontologies/obo-all/mammalian_phenotype/ mammalian_phenotype.owl GO Function http://www.berkeleybop.org/ontologies/obo-all/molecular_function/molecular_function.owl GO Process http://www.berkeleybop.org/ontologies/obo-all/biological_process/biological_process.owl GO component http://www.berkeleybop.org/ontologies/obo-all/cellular_component/cellular_component.owl Monday, September 27, 2010
  • 16. Rat Genome Database Wide variety of data types - genomic and physiological many with corresponding ontologies 16 Monday, September 27, 2010
  • 18. RGD->RDF Existing RGD object types & mappings to SO Monday, September 27, 2010
  • 21. QTL Highlighter Rails source code will be available on GitHub RDFizer (ruby) http://github.com/simont/MCW-RDF Monday, September 27, 2010
  • 22. Next Steps Register PURL for RGD Create RGD core object ontology (OWL/RDF) Select appropriate URIs for RGD data Ontology annotations - how best to represent in triple store? Export GMiner data to RDF-> Triple Store Document & re鍖ne biological use cases related to candidate gene selection/evaluation Identify additional data required for candidate gene selection, RDFize as appropriate, load into triple store. Connections to other RDF collections/LOD, etc.? Monday, September 27, 2010