ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
The Ruby UCSC API:
 accessing the UCSC Genome
 Database using Ruby
                Hiroyuki Mishima(1, Jan Aerts(2, Toshiaki Katayama(3,
                Raoul J.P. Bonnal(4, Koh-ichiro Yoshiura(1
                                      1)Nagasaki University, Japan;
                                      2)Leuven University, Belgium;
                                      3)DBCLS, ROIS, Japan;
                                      4)Instituto Nazionale Genetica Molecolare, Italy

20th Annual International Conference on Integrate Systems for Molecular Biology
2012 July 15-17, @Long Beach, CA, USA
Background:
The University of California, Santa Cruz (UCSC) genome database is among the most used
sources of genomic annotation in human and other organisms. The database offers excellent
web-based graphical user interface (the UCSC genome browser) and several means for
programmatic queries. A simple application programming interface (API) in a scripting
language aimed at the biologist was however not yet available. Here, we present the Ruby
UCSC API, a library to access the UCSC genome database using Ruby.
Results:
The API is designed as a BioRuby plug-in (Biogem) and built on the ActiveRecord 3 framework
for the object-relational mapping, making writing SQL statements unnecessary. The current
version of the API supports databases of all organisms in the UCSC genome database including
human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.
The API uses the bin index¡ªif available¡ªwhen querying for genomic intervals. The API also
supports genomic sequence queries using locally downloaded *.2bit files that are not stored
in the official MySQL database. The API is implemented in pure Ruby and is therefore available
in different environments and with different Ruby interpreters (including JRuby).
Conclusions:
Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby
UCSC API will facilitate biologists to query the UCSC genome database programmatically. The
API is available through the RubyGem system. Source codes and documentations are available
at https://github.com/misshie/bioruby-ucsc-api/                                         2
The UCSC genome database
? UCSC genome database is among the most used
  sources of genomic annotation in human and
  other organisms.
? Excellent web-based graphical user interface
  (the UCSC genome browser) and several means
  for programmatic queries.
? A simple application programming interface
  (API) in a scripting language aimed at the
  biologist was however not yet available.
? Supporting a large number of tables (>40,000) is
  still challenging.                            3
Ruby UCSC API
? A Ruby library to access
  the UCSC genome
  database.
? Designed as a Biogem -
  BioRuby plug-in
? Built on the ActiveRecord
  3 framework for an
  object-relational mapping.
? Written in pure Ruby ¨C
  supporting MRI Ruby          Design structure of
  1.9/1.8 and JRuby            the Ruby UCSC API
                                                     4
Dynamic Table Class Definition

? The UCSC database is optimized to serve the genome
  browser, resulting in a very large number of tables
   ? > 41,840 tables as MySQL *.MYD files
? Database components are updated frequently.
? Ruby UCSC API adopts dynamic class definition to
  handle many table classes.
? When a table class referred for the first time, the API
  prefetch fields of the table to detect a table type and
  define appropriate table class. Additionally, this lazy
  evaluation of class definition makes API initialization
  much faster.
                                                            5
Availability and Installation
  Installation via RubyGems

        $ gem install bio-ucsc-api

  GitHub
  https://github.com/misshie/bioruby-ucsc-api
  Support Forum
  http://rubyucscapi.userecho.com/
  RubyGems.org
  https://rubygems.org/gems/bio-ucsc-api
                                                6
Sample Codes and Features
    require 'bio-ucsc¡®
    Bio::Ucsc::Hg19.connect
    result =
      Bio::Ucsc::Hg19::Snp131.
      find_by_name("rs56289060")
    puts result.chrom # => "chr1"
  ? Supporting all organisms and at least newest
    assemblies
  ? Supporting UCSC¡¯s official MySQL server and local
    mirror MySQL servers
  ? ActiveRecord¡¯s object-relation mapping              7
region = "chr17:7,579,614-7,579,700"
    condition =
      Bio::Ucsc::Hg19::Snp131.
      with_interval(region).select(:name)
    puts condition.to_sql


    SELECT name FROM `snp131`
    WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0)
     AND ( (chromStart BETWEEN 7579613 AND 7579700)
        OR (chromEnd BETWEEN 7579613 AND 7579700)
        OR (chromStart <= 7579613 AND
            chromEND >= 7579700) ));

? Generating complex SQL statements using relations
? The bin index is, if available, used to accelerate queries.
                                                          8
# declaration of the table association
  Ucsc::Hg19::KnownGene.class_eval do
    has_one :knownToEnsembl, {:primary_key => :name,
                              :foreign_key => :name}
  end
  # reference to an associated field
  puts Ucsc::Hg19::KnownGene.first.name
    # => ¡°uc001aaa3¡±
  puts Ucsc::Hg19::KnownGene.first.knownToEnsembl.value
    # => "ENST00000456328"


? The user can define table associations.
? Associated tables can be accessed like fields of the
  table.
                                                         9
1: # load a locally-stored sequence file,
      and extract partial seqence
 2: seq = Ucsc::File::Twobit.open("hg19.2bit")
 3: puts seq.subseq("chr1:9990-10009")
      # => "NNNNNNNNNNNTAACCCTAA"

? In the UCSC genome database, genomic sequences are
  not stored in the MySQL databases but in *.2bit files.
? Reference sequence objects are generated by the
  File::Twobit.open class methods, and sequences
  can be retrieved by the File::Twobit#subseq
  method.
                                                    10
Supported Databases
clade/organism     databases
human              Hg19, Hg18
mammals            chimp (PanTro3), orangutan (PonAbe2), rhesus (RheMac2), marmoset (CalJac3),
                   mouse (Mm9), rat (Rn4), guinea pig (CavPor3), rabbit (OryCun2), cat (FelCat4),
                   panda (AilMel1), dog (CanFam2), horse (EquCab2), pig (SusScr2), sheep
                   (OviAri1), cow (BosTau4), elephant (LoxAfr3), opossum (MonDom5), platypus
                   (OrnAna1)
vertebrates        chicken (GalGal3), zebra finch (TaeGut1), lizard (AnoCar2), X. tropicalis
                   (XenTro2), zebrafish (DanRer7), tetraodon (TetNig2), fugu (Fr2), stickleback
                   (GasAcu1), medaka (OryLat2), lamprey (PetMar1)
deuterostomes      lancelet (BraFlo1), sea squirt (Ci2), sea urchin (StrPur2)
insects            D.melanogaster (Dm3), D.simulans (DroSim1), D.sechellia (DroSec1), D.yakuba
                   (DroYak2), D.erecta (DroEre1), D.ananassae (DroAna2), D.pseudoobscura (Dp3),
                   D.persimilis (DroPer1), D.virilis (DroVir2), D.mojavensis (DroMoj2), D.grimshawi
                   (DroGri1), Anopheles mosquito (AnoGam1), honey bee (ApiMel2)
nematodes          C.elegans (Ce6), C.brenneri (CaePb3), C.briggsae (Cb3), C.remanei (CaeRem3),
                   C.japonica (CaeJap1), P.pacificus (PriPac1)
others             sea hare (AplCal1), yeast (SacCer2)
common databases   Go, HgFixed, Proteome, UniProt, VisiGene                                   11
Current Limitations

? Table associations are not defined automatically.
? For some tables including subsets of the
  ENCODE tables, the actual data are not stored in
  the MySQL database itself but are stored as
  references to BigWig, BigBed and BAM files. To
  date, the Ruby UCSC API does not support them
  yet. Instead, a Biogem, ¡°bio-samtools¡±, suppots
  BAM file handlings.
                                                12
Conclusions

 ? UCSC¡¯s official executables and C libraries are
   the most comprehensive and fastest API for the
   UCSC genome database.
 ? However, APIs for scripting languages still have
   significant advantages for the user because
   their concern is not only runtime speed but also
   total time from programming to results.
 ? The Ruby UCSC API can therefore have a
   significant impact in the field.
                                                 13

More Related Content

Similar to The Ruby UCSC API @ISMB2012 (20)

Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
Fran?ois Belleau
?
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
?
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
?
Danis biosystematics2011
Danis biosystematics2011Danis biosystematics2011
Danis biosystematics2011
Bruno Danis
?
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
Silpa87
?
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
Enis Afgan
?
2013 oct 2 rna sequencing
2013 oct 2 rna sequencing2013 oct 2 rna sequencing
2013 oct 2 rna sequencing
Anne Deslattes Mays
?
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
?
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
?
The Pacific Research Platform? Two Years In
The Pacific Research Platform? Two Years InThe Pacific Research Platform? Two Years In
The Pacific Research Platform? Two Years In
Larry Smarr
?
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
Yannick Wurm
?
SeqinR - biological data handling
SeqinR - biological data handlingSeqinR - biological data handling
SeqinR - biological data handling
pau_corral
?
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
Christian Frech
?
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
Fran?ois Belleau
?
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
?
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
Jo?o Andr¨¦ Carri?o
?
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
?
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
mkim8
?
NCBI API - Integration into analysis code
NCBI API - Integration into analysis codeNCBI API - Integration into analysis code
NCBI API - Integration into analysis code
Jiwoong Kim
?
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
Chunlei Wu
?
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
?
Danis biosystematics2011
Danis biosystematics2011Danis biosystematics2011
Danis biosystematics2011
Bruno Danis
?
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
Silpa87
?
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
Enis Afgan
?
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
?
The Pacific Research Platform? Two Years In
The Pacific Research Platform? Two Years InThe Pacific Research Platform? Two Years In
The Pacific Research Platform? Two Years In
Larry Smarr
?
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
Yannick Wurm
?
SeqinR - biological data handling
SeqinR - biological data handlingSeqinR - biological data handling
SeqinR - biological data handling
pau_corral
?
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
Christian Frech
?
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
?
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
Jo?o Andr¨¦ Carri?o
?
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
?
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
mkim8
?
NCBI API - Integration into analysis code
NCBI API - Integration into analysis codeNCBI API - Integration into analysis code
NCBI API - Integration into analysis code
Jiwoong Kim
?
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
Chunlei Wu
?

Recently uploaded (20)

Benefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle CloudBenefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle Cloud
AstuteBusiness
?
Columbia Weather Systems - Product Overview
Columbia Weather Systems - Product OverviewColumbia Weather Systems - Product Overview
Columbia Weather Systems - Product Overview
Columbia Weather Systems
?
Graphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAIGraphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
?
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025
Sugarlab AI
?
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8SAP Automation with UiPath: SAP Test Automation - Part 5 of 8
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8
DianaGray10
?
RBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptx
RBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptxRBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptx
RBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptx
quinlan4
?
How to manage technology risk and corporate growth
How to manage technology risk and corporate growthHow to manage technology risk and corporate growth
How to manage technology risk and corporate growth
Arlen Meyers, MD, MBA
?
ºÝºÝߣs from Perth MuleSoft Meetup March 2025
ºÝºÝߣs from Perth MuleSoft Meetup March 2025ºÝºÝߣs from Perth MuleSoft Meetup March 2025
ºÝºÝߣs from Perth MuleSoft Meetup March 2025
Michael Price
?
UiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio Web
UiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio WebUiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio Web
UiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio Web
DianaGray10
?
selection of competencies requiring ICT integration.pptx
selection of competencies requiring ICT integration.pptxselection of competencies requiring ICT integration.pptx
selection of competencies requiring ICT integration.pptx
escuyoscherrymae
?
Testing Tools for Accessibility Enhancement Part II.pptx
Testing Tools for Accessibility Enhancement Part II.pptxTesting Tools for Accessibility Enhancement Part II.pptx
Testing Tools for Accessibility Enhancement Part II.pptx
Julia Undeutsch
?
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
?
Building High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdfBuilding High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdf
Rafael Burity
?
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥ÈDragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
CRI Japan, Inc.
?
UiPath NY AI Series: Session 3: UiPath Autopilot for Everyone with Clipboard AI
UiPath NY AI Series: Session 3:  UiPath Autopilot for Everyone with Clipboard AIUiPath NY AI Series: Session 3:  UiPath Autopilot for Everyone with Clipboard AI
UiPath NY AI Series: Session 3: UiPath Autopilot for Everyone with Clipboard AI
DianaGray10
?
Presentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdfPresentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdf
Mukesh Kala
?
Harnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdfHarnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdf
rabiabajaj1
?
The Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative MetalsThe Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative Metals
anupriti
?
How AWS Encryption Key Options Impact Your Security and Compliance
How AWS Encryption Key Options Impact Your Security and ComplianceHow AWS Encryption Key Options Impact Your Security and Compliance
How AWS Encryption Key Options Impact Your Security and Compliance
Chris Bingham
?
How Air Coil Inductors Work By Cet Technology
How Air Coil Inductors Work By Cet TechnologyHow Air Coil Inductors Work By Cet Technology
How Air Coil Inductors Work By Cet Technology
CET Technology
?
Benefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle CloudBenefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle Cloud
AstuteBusiness
?
Graphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAIGraphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
?
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025
Sugarlab AI
?
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8SAP Automation with UiPath: SAP Test Automation - Part 5 of 8
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8
DianaGray10
?
RBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptx
RBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptxRBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptx
RBM - PIXIAGE - AskPixi Page - Inpixon-MWC 2025.pptx
quinlan4
?
How to manage technology risk and corporate growth
How to manage technology risk and corporate growthHow to manage technology risk and corporate growth
How to manage technology risk and corporate growth
Arlen Meyers, MD, MBA
?
ºÝºÝߣs from Perth MuleSoft Meetup March 2025
ºÝºÝߣs from Perth MuleSoft Meetup March 2025ºÝºÝߣs from Perth MuleSoft Meetup March 2025
ºÝºÝߣs from Perth MuleSoft Meetup March 2025
Michael Price
?
UiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio Web
UiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio WebUiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio Web
UiPath NY AI Series: Session 4: UiPath AutoPilot for Developers using Studio Web
DianaGray10
?
selection of competencies requiring ICT integration.pptx
selection of competencies requiring ICT integration.pptxselection of competencies requiring ICT integration.pptx
selection of competencies requiring ICT integration.pptx
escuyoscherrymae
?
Testing Tools for Accessibility Enhancement Part II.pptx
Testing Tools for Accessibility Enhancement Part II.pptxTesting Tools for Accessibility Enhancement Part II.pptx
Testing Tools for Accessibility Enhancement Part II.pptx
Julia Undeutsch
?
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsAchieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
?
Building High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdfBuilding High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdf
Rafael Burity
?
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥ÈDragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
CRI Japan, Inc.
?
UiPath NY AI Series: Session 3: UiPath Autopilot for Everyone with Clipboard AI
UiPath NY AI Series: Session 3:  UiPath Autopilot for Everyone with Clipboard AIUiPath NY AI Series: Session 3:  UiPath Autopilot for Everyone with Clipboard AI
UiPath NY AI Series: Session 3: UiPath Autopilot for Everyone with Clipboard AI
DianaGray10
?
Presentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdfPresentation Session 2 -Context Grounding.pdf
Presentation Session 2 -Context Grounding.pdf
Mukesh Kala
?
Harnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdfHarnessing the Power of AI in Salesforce.pdf
Harnessing the Power of AI in Salesforce.pdf
rabiabajaj1
?
The Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative MetalsThe Future of Materials: Transitioning from Silicon to Alternative Metals
The Future of Materials: Transitioning from Silicon to Alternative Metals
anupriti
?
How AWS Encryption Key Options Impact Your Security and Compliance
How AWS Encryption Key Options Impact Your Security and ComplianceHow AWS Encryption Key Options Impact Your Security and Compliance
How AWS Encryption Key Options Impact Your Security and Compliance
Chris Bingham
?
How Air Coil Inductors Work By Cet Technology
How Air Coil Inductors Work By Cet TechnologyHow Air Coil Inductors Work By Cet Technology
How Air Coil Inductors Work By Cet Technology
CET Technology
?

The Ruby UCSC API @ISMB2012

  • 1. The Ruby UCSC API: accessing the UCSC Genome Database using Ruby Hiroyuki Mishima(1, Jan Aerts(2, Toshiaki Katayama(3, Raoul J.P. Bonnal(4, Koh-ichiro Yoshiura(1 1)Nagasaki University, Japan; 2)Leuven University, Belgium; 3)DBCLS, ROIS, Japan; 4)Instituto Nazionale Genetica Molecolare, Italy 20th Annual International Conference on Integrate Systems for Molecular Biology 2012 July 15-17, @Long Beach, CA, USA
  • 2. Background: The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results: The API is designed as a BioRuby plug-in (Biogem) and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index¡ªif available¡ªwhen querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions: Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source codes and documentations are available at https://github.com/misshie/bioruby-ucsc-api/ 2
  • 3. The UCSC genome database ? UCSC genome database is among the most used sources of genomic annotation in human and other organisms. ? Excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. ? A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. ? Supporting a large number of tables (>40,000) is still challenging. 3
  • 4. Ruby UCSC API ? A Ruby library to access the UCSC genome database. ? Designed as a Biogem - BioRuby plug-in ? Built on the ActiveRecord 3 framework for an object-relational mapping. ? Written in pure Ruby ¨C supporting MRI Ruby Design structure of 1.9/1.8 and JRuby the Ruby UCSC API 4
  • 5. Dynamic Table Class Definition ? The UCSC database is optimized to serve the genome browser, resulting in a very large number of tables ? > 41,840 tables as MySQL *.MYD files ? Database components are updated frequently. ? Ruby UCSC API adopts dynamic class definition to handle many table classes. ? When a table class referred for the first time, the API prefetch fields of the table to detect a table type and define appropriate table class. Additionally, this lazy evaluation of class definition makes API initialization much faster. 5
  • 6. Availability and Installation Installation via RubyGems $ gem install bio-ucsc-api GitHub https://github.com/misshie/bioruby-ucsc-api Support Forum http://rubyucscapi.userecho.com/ RubyGems.org https://rubygems.org/gems/bio-ucsc-api 6
  • 7. Sample Codes and Features require 'bio-ucsc¡® Bio::Ucsc::Hg19.connect result = Bio::Ucsc::Hg19::Snp131. find_by_name("rs56289060") puts result.chrom # => "chr1" ? Supporting all organisms and at least newest assemblies ? Supporting UCSC¡¯s official MySQL server and local mirror MySQL servers ? ActiveRecord¡¯s object-relation mapping 7
  • 8. region = "chr17:7,579,614-7,579,700" condition = Bio::Ucsc::Hg19::Snp131. with_interval(region).select(:name) puts condition.to_sql SELECT name FROM `snp131` WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0) AND ( (chromStart BETWEEN 7579613 AND 7579700) OR (chromEnd BETWEEN 7579613 AND 7579700) OR (chromStart <= 7579613 AND chromEND >= 7579700) )); ? Generating complex SQL statements using relations ? The bin index is, if available, used to accelerate queries. 8
  • 9. # declaration of the table association Ucsc::Hg19::KnownGene.class_eval do has_one :knownToEnsembl, {:primary_key => :name, :foreign_key => :name} end # reference to an associated field puts Ucsc::Hg19::KnownGene.first.name # => ¡°uc001aaa3¡± puts Ucsc::Hg19::KnownGene.first.knownToEnsembl.value # => "ENST00000456328" ? The user can define table associations. ? Associated tables can be accessed like fields of the table. 9
  • 10. 1: # load a locally-stored sequence file, and extract partial seqence 2: seq = Ucsc::File::Twobit.open("hg19.2bit") 3: puts seq.subseq("chr1:9990-10009") # => "NNNNNNNNNNNTAACCCTAA" ? In the UCSC genome database, genomic sequences are not stored in the MySQL databases but in *.2bit files. ? Reference sequence objects are generated by the File::Twobit.open class methods, and sequences can be retrieved by the File::Twobit#subseq method. 10
  • 11. Supported Databases clade/organism databases human Hg19, Hg18 mammals chimp (PanTro3), orangutan (PonAbe2), rhesus (RheMac2), marmoset (CalJac3), mouse (Mm9), rat (Rn4), guinea pig (CavPor3), rabbit (OryCun2), cat (FelCat4), panda (AilMel1), dog (CanFam2), horse (EquCab2), pig (SusScr2), sheep (OviAri1), cow (BosTau4), elephant (LoxAfr3), opossum (MonDom5), platypus (OrnAna1) vertebrates chicken (GalGal3), zebra finch (TaeGut1), lizard (AnoCar2), X. tropicalis (XenTro2), zebrafish (DanRer7), tetraodon (TetNig2), fugu (Fr2), stickleback (GasAcu1), medaka (OryLat2), lamprey (PetMar1) deuterostomes lancelet (BraFlo1), sea squirt (Ci2), sea urchin (StrPur2) insects D.melanogaster (Dm3), D.simulans (DroSim1), D.sechellia (DroSec1), D.yakuba (DroYak2), D.erecta (DroEre1), D.ananassae (DroAna2), D.pseudoobscura (Dp3), D.persimilis (DroPer1), D.virilis (DroVir2), D.mojavensis (DroMoj2), D.grimshawi (DroGri1), Anopheles mosquito (AnoGam1), honey bee (ApiMel2) nematodes C.elegans (Ce6), C.brenneri (CaePb3), C.briggsae (Cb3), C.remanei (CaeRem3), C.japonica (CaeJap1), P.pacificus (PriPac1) others sea hare (AplCal1), yeast (SacCer2) common databases Go, HgFixed, Proteome, UniProt, VisiGene 11
  • 12. Current Limitations ? Table associations are not defined automatically. ? For some tables including subsets of the ENCODE tables, the actual data are not stored in the MySQL database itself but are stored as references to BigWig, BigBed and BAM files. To date, the Ruby UCSC API does not support them yet. Instead, a Biogem, ¡°bio-samtools¡±, suppots BAM file handlings. 12
  • 13. Conclusions ? UCSC¡¯s official executables and C libraries are the most comprehensive and fastest API for the UCSC genome database. ? However, APIs for scripting languages still have significant advantages for the user because their concern is not only runtime speed but also total time from programming to results. ? The Ruby UCSC API can therefore have a significant impact in the field. 13