The Ruby UCSC API is a library that provides access to the UCSC Genome Database using the Ruby programming language. It is designed as a BioRuby plug-in and uses the ActiveRecord framework to query genomic data without writing SQL statements. The API supports over 40,000 tables across many organisms and facilitates programmatic querying of genomic data for biologists using Ruby.
Apache Mesos apporte une abstraction des ressources CPU, m¨¦moire et stockage, d'un ensemble de machines (physiques ou virtuelles). Un cluster Mesos forme un syst¨¨me distribu¨¦, tol¨¦rant ¨¤ la panne et permettant de scaler les services d¨¦ploy¨¦s. Venez d¨¦couvrir les fonctionnalit¨¦s de Mesos, ses strat¨¦gies de d¨¦ploiement, et l'¨¦cosyst¨¨me Mesosphere qui apporte scalabilit¨¦ et d¨¦couverte de services. Nous visiterons un cluster h¨¦bergeant une application JS et un backend REST conteneuris¨¦s. Pour finir, nous jouerons au Chaos Monkey afin de prouver la r¨¦silience du syst¨¨me mis en place.
Speaker : Pablo Lopez, Aur¨¦lien Maury, Jean-Baptiste Claramonte et Jean-Pascal Thiery ¨¤ Devoxx France 2015
This document discusses Clojure's support for modularity through its data structures like lists, vectors, maps, and sets. It also covers Clojure's approach to namespaces, multimethods, vars, refs, agents, and dependency management compared to other languages and frameworks like CPAN, Rubygems, Maven, and OSGi.
70% of the world's poor are women, who receive 30-40% less pay than men and head 80-90% of poor families. Over a billion people live on less than $1 per day, and each day over 30,000 children die from preventable causes like malnutrition and disease. Meeting the needs of those in extreme poverty requires a large-scale effort to match the magnitude of the global problem. Changing ourselves can help address poverty by changing how we think and helping others.
This document summarizes an agenda for a conference on genomic and cytogenetics research. It discusses:
1) Designing a genome-wide search for structural variants related to type 1 diabetes.
2) Ensuring the highest quality copy number variant results in genomic studies.
3) Updates on the International Standards for Cytogenomic Arrays consortium and database.
The document discusses OGT's targeted sequencing services. It outlines the three main steps: 1) expert project design and selection of genomic regions for enrichment, 2) performing DNA capture, sequencing, and library preparation, and 3) data analysis and advanced filtering of variants to deliver meaningful results. The presentation emphasizes OGT's optimized bait design process which aims to improve coverage evenness across targets and maximize likelihood of variant detection.
1. This document outlines BioHackathon 2015 which was held in Nagasaki, Japan.
2. It provides a brief history of Nagasaki and Nagasaki University, including its origins as a trading port and the establishment of its medical school.
3. It discusses challenges in gene hunting for rare and undiagnosed diseases even with new sequencing technologies, and proposes that semantic web technologies can help address these challenges by facilitating knowledge sharing across databases.
This document discusses Bio2RDF, a project that converts life science databases into RDF and makes them accessible via SPARQL endpoints. It provides background on the need for data integration, describes how Bio2RDF was implemented including the conversion process and architecture, and outlines future goals like adding more datasets and developing new services.
This document provides an introduction and overview of common methods for processing and analyzing next generation sequencing (NGS) data, including mapping NGS reads and de novo assembly of NGS reads. It discusses various NGS applications such as RNA-Seq, epigenetics, structural variation detection, and metagenomics. Key steps in read alignment such as choosing an alignment program and viewing alignments are outlined. Considerations for choosing an alignment program based on library type, read type, and platform are also reviewed. Popular alignment programs including Bowtie, BWA, TopHat, and Novoalign are mentioned.
Biological databases store and organize large amounts of biological data for research use. There are many types of biological databases that classify data by type, such as nucleotide sequences, protein sequences, genomes, protein structures, gene expression, and metabolic pathways. Databases can also be classified by their data source as primary databases containing experimental results or secondary databases that analyze primary database results. Database availability varies, with some publicly open and others proprietary. Common biological databases discussed include GenBank, UniProt, PDB, KEGG, and FlyBase.
The document discusses SCAR-MarBIN and ANTABIF, which provide free and open access to Antarctic biodiversity data. Their goals are to exchange scientific data and results from Antarctica freely to promote international cooperation and adaptive conservation/management. They have developed web portals and databases containing over 850,000 visitors and 35 million data records downloaded. Their philosophy is to build an open electronic ecosystem offering access to taxonomic and geospatial biodiversity data using open source solutions.
Databases store organized information in tables and fields. A database management system interacts with users and applications to capture and analyze data. Biological databases contain life sciences information from experiments, literature, and computational analysis. They classify sequences, structures, and functions. Common biological databases include GenBank, UniProt, and PDB.
This document provides an outline for a presentation on RNA sequencing. It discusses the history of DNA sequencing and 64-bit computing advances that enabled RNA sequencing. It then describes the RNA sequencing workflow, challenges like detecting signal vs noise, and visualization tools for mapped RNA sequencing results like the UCSC Genome Browser.
This document provides information about bioinformatics resources including databases of nucleotide and protein sequences. It discusses flat file databases like GenBank that store sequence data in plain text files and relational databases that improve data organization. Examples of popular biological databases are described, such as GenBank, EMBL, and DDBJ for nucleotide sequences and Swiss-Prot and TrEMBL for protein sequences. The document also covers sequence file formats, web tools for querying databases, and trace files used in sequence assembly.
GenBank, EMBL, and DDBJ are primary nucleotide sequence databases that collaborate to store publicly available DNA sequences. NCBI's GenBank is one of the largest primary sequence databases, containing over 240,000 organisms' sequences submitted from laboratories. PubMed and Entrez are literature and biomedical databases maintained by NCBI that allow users to search biomedical research articles and integrate related data from multiple sources. SRS is a sequence retrieval system developed by EBI that integrates over 250 molecular biology databases and allows complex queries across data sources.
The Pacific Research Platform? Two Years InLarry Smarr
?
This document provides an overview of the Pacific Research Platform (PRP) after two years of operation. It describes several science drivers that are using the PRP, including biomedical research on cancer genomics and microbiomes, earth sciences like earthquake modeling, and astronomy. It highlights how the PRP is connecting sites like UC San Diego, UC Santa Cruz, UC Berkeley to share and analyze large datasets using high-speed networks. The PRP is expanding to support new areas like deep learning, cultural heritage projects, and connecting additional UC campuses through network upgrades.
1. The document discusses best practices for scientific software development, including writing code for people rather than computers, automating repetitive tasks, using version control, and conducting code reviews.
2. Specific approaches and tools recommended are planning for mistakes, automated testing, continuous integration, and using a coding style guide. R and Ruby style guides are provided as examples.
3. The benefits of following such practices are improving productivity, reducing errors, making code easier to read and maintain, and allowing scientists to focus on scientific questions rather than software issues. Reproducible and sustainable software is the overall goal.
This document provides instructions for summarizing a video titled "what we used to do in Bioinformatics". The video shows how to search the NCBI database and download a FASTA file for the complete genome of Dengue Virus. It demonstrates searching for the entry NC_001477 on the NCBI website, which returns a single matching entry for Dengue Virus. It then shows downloading the corresponding FASTA file and opening it in a text editor. The purpose is to illustrate how tedious it was to access databases and download entries before modern tools, requiring no less than 5 mouse clicks.
Geared towards bioinformatics students and taking a somewhat humoristic point of view, this presentation explains what bioinformaticians are and what they do.
Role of bioinformatics in life sciences researchAnshika Bansal
?
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
This document summarizes bioinformatics tools that can be used for analysis of high-throughput sequencing data for molecular diagnostics. It discusses databases for virulence factors and antimicrobial resistance as well as tools for assembly, annotation, pan-genome analysis, visualization, and commercial solutions. The presentation emphasizes that there is no single best tool and different approaches are needed for different questions. Collaboration with other researchers is recommended.
The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from NCBI databases like PubMed, Nucleotide, and Gene. It describes the 8 E-utilities programs (ESearch, EPost, ESummary, EFetch, ELink, EInfo, EGQuery, ESpell) and provides examples of building analysis pipelines using combinations of the E-utilities, such as searching PubMed and linking results to Gene records or nucleotide sequences. Sample applications demonstrated include finding related human genes to PubMed articles on osteosarcoma, downloading nucleotide sequences for Burkholderia cepacia complex, and identifying PubMed articles discussing cancer copy number changes that used specific GEO microarray
NCBI API - Integration into analysis codeJiwoong Kim
?
The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from the NCBI databases through web services. It describes the 8 E-utilities services including ESearch, ESummary, EFetch, ELink and provides examples of how to use them in analysis pipelines. For instance, it shows how to find related human genes to articles on osteosarcoma by searching PubMed, linking to genes, and summarizing the gene data. It also demonstrates integrating PubMed and GEO Dataset searches to find cancer copy number articles associated with specific microarray platforms.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
?
My talk at NCI's CBIIT speaker series:
https://wiki.nci.nih.gov/display/CBIITSpeakers/2019/01/02/Jan+16%2C+Chunlei+Wu%2C+BioThings+API
A companion blog post: https://ncip.nci.nih.gov/blog/the-network-of-biothings/
See more details about BioThings project at http://biothings.io.
Benefits of Moving Ellucian Banner to Oracle CloudAstuteBusiness
?
Discover the advantages of migrating Ellucian Banner to Oracle Cloud Infrastructure, including scalability, security, and cost efficiency for educational institutions.
Columbia Weather Systems offers professional weather stations in basically three configurations for industry and government agencies worldwide: Fixed-Base or Fixed-Mount Weather Stations, Portable Weather Stations, and Vehicle-Mounted Weather Stations.
Models include all-in-one sensor configurations as well as modular environmental monitoring systems. Real-time displays include hardware console, WeatherMaster? Software, and a Weather MicroServer? with industrial protocols, web and app monitoring options.
Innovative Weather Monitoring: Trusted by industry and government agencies worldwide. Professional, easy-to-use monitoring options. Customized sensor configurations. One-year warranty with personal technical support. Proven reliability, innovation, and brand recognition for over 45 years.
This document discusses Bio2RDF, a project that converts life science databases into RDF and makes them accessible via SPARQL endpoints. It provides background on the need for data integration, describes how Bio2RDF was implemented including the conversion process and architecture, and outlines future goals like adding more datasets and developing new services.
This document provides an introduction and overview of common methods for processing and analyzing next generation sequencing (NGS) data, including mapping NGS reads and de novo assembly of NGS reads. It discusses various NGS applications such as RNA-Seq, epigenetics, structural variation detection, and metagenomics. Key steps in read alignment such as choosing an alignment program and viewing alignments are outlined. Considerations for choosing an alignment program based on library type, read type, and platform are also reviewed. Popular alignment programs including Bowtie, BWA, TopHat, and Novoalign are mentioned.
Biological databases store and organize large amounts of biological data for research use. There are many types of biological databases that classify data by type, such as nucleotide sequences, protein sequences, genomes, protein structures, gene expression, and metabolic pathways. Databases can also be classified by their data source as primary databases containing experimental results or secondary databases that analyze primary database results. Database availability varies, with some publicly open and others proprietary. Common biological databases discussed include GenBank, UniProt, PDB, KEGG, and FlyBase.
The document discusses SCAR-MarBIN and ANTABIF, which provide free and open access to Antarctic biodiversity data. Their goals are to exchange scientific data and results from Antarctica freely to promote international cooperation and adaptive conservation/management. They have developed web portals and databases containing over 850,000 visitors and 35 million data records downloaded. Their philosophy is to build an open electronic ecosystem offering access to taxonomic and geospatial biodiversity data using open source solutions.
Databases store organized information in tables and fields. A database management system interacts with users and applications to capture and analyze data. Biological databases contain life sciences information from experiments, literature, and computational analysis. They classify sequences, structures, and functions. Common biological databases include GenBank, UniProt, and PDB.
This document provides an outline for a presentation on RNA sequencing. It discusses the history of DNA sequencing and 64-bit computing advances that enabled RNA sequencing. It then describes the RNA sequencing workflow, challenges like detecting signal vs noise, and visualization tools for mapped RNA sequencing results like the UCSC Genome Browser.
This document provides information about bioinformatics resources including databases of nucleotide and protein sequences. It discusses flat file databases like GenBank that store sequence data in plain text files and relational databases that improve data organization. Examples of popular biological databases are described, such as GenBank, EMBL, and DDBJ for nucleotide sequences and Swiss-Prot and TrEMBL for protein sequences. The document also covers sequence file formats, web tools for querying databases, and trace files used in sequence assembly.
GenBank, EMBL, and DDBJ are primary nucleotide sequence databases that collaborate to store publicly available DNA sequences. NCBI's GenBank is one of the largest primary sequence databases, containing over 240,000 organisms' sequences submitted from laboratories. PubMed and Entrez are literature and biomedical databases maintained by NCBI that allow users to search biomedical research articles and integrate related data from multiple sources. SRS is a sequence retrieval system developed by EBI that integrates over 250 molecular biology databases and allows complex queries across data sources.
The Pacific Research Platform? Two Years InLarry Smarr
?
This document provides an overview of the Pacific Research Platform (PRP) after two years of operation. It describes several science drivers that are using the PRP, including biomedical research on cancer genomics and microbiomes, earth sciences like earthquake modeling, and astronomy. It highlights how the PRP is connecting sites like UC San Diego, UC Santa Cruz, UC Berkeley to share and analyze large datasets using high-speed networks. The PRP is expanding to support new areas like deep learning, cultural heritage projects, and connecting additional UC campuses through network upgrades.
1. The document discusses best practices for scientific software development, including writing code for people rather than computers, automating repetitive tasks, using version control, and conducting code reviews.
2. Specific approaches and tools recommended are planning for mistakes, automated testing, continuous integration, and using a coding style guide. R and Ruby style guides are provided as examples.
3. The benefits of following such practices are improving productivity, reducing errors, making code easier to read and maintain, and allowing scientists to focus on scientific questions rather than software issues. Reproducible and sustainable software is the overall goal.
This document provides instructions for summarizing a video titled "what we used to do in Bioinformatics". The video shows how to search the NCBI database and download a FASTA file for the complete genome of Dengue Virus. It demonstrates searching for the entry NC_001477 on the NCBI website, which returns a single matching entry for Dengue Virus. It then shows downloading the corresponding FASTA file and opening it in a text editor. The purpose is to illustrate how tedious it was to access databases and download entries before modern tools, requiring no less than 5 mouse clicks.
Geared towards bioinformatics students and taking a somewhat humoristic point of view, this presentation explains what bioinformaticians are and what they do.
Role of bioinformatics in life sciences researchAnshika Bansal
?
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
This document summarizes bioinformatics tools that can be used for analysis of high-throughput sequencing data for molecular diagnostics. It discusses databases for virulence factors and antimicrobial resistance as well as tools for assembly, annotation, pan-genome analysis, visualization, and commercial solutions. The presentation emphasizes that there is no single best tool and different approaches are needed for different questions. Collaboration with other researchers is recommended.
The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from NCBI databases like PubMed, Nucleotide, and Gene. It describes the 8 E-utilities programs (ESearch, EPost, ESummary, EFetch, ELink, EInfo, EGQuery, ESpell) and provides examples of building analysis pipelines using combinations of the E-utilities, such as searching PubMed and linking results to Gene records or nucleotide sequences. Sample applications demonstrated include finding related human genes to PubMed articles on osteosarcoma, downloading nucleotide sequences for Burkholderia cepacia complex, and identifying PubMed articles discussing cancer copy number changes that used specific GEO microarray
NCBI API - Integration into analysis codeJiwoong Kim
?
The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from the NCBI databases through web services. It describes the 8 E-utilities services including ESearch, ESummary, EFetch, ELink and provides examples of how to use them in analysis pipelines. For instance, it shows how to find related human genes to articles on osteosarcoma by searching PubMed, linking to genes, and summarizing the gene data. It also demonstrates integrating PubMed and GEO Dataset searches to find cancer copy number articles associated with specific microarray platforms.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
?
My talk at NCI's CBIIT speaker series:
https://wiki.nci.nih.gov/display/CBIITSpeakers/2019/01/02/Jan+16%2C+Chunlei+Wu%2C+BioThings+API
A companion blog post: https://ncip.nci.nih.gov/blog/the-network-of-biothings/
See more details about BioThings project at http://biothings.io.
Benefits of Moving Ellucian Banner to Oracle CloudAstuteBusiness
?
Discover the advantages of migrating Ellucian Banner to Oracle Cloud Infrastructure, including scalability, security, and cost efficiency for educational institutions.
Columbia Weather Systems offers professional weather stations in basically three configurations for industry and government agencies worldwide: Fixed-Base or Fixed-Mount Weather Stations, Portable Weather Stations, and Vehicle-Mounted Weather Stations.
Models include all-in-one sensor configurations as well as modular environmental monitoring systems. Real-time displays include hardware console, WeatherMaster? Software, and a Weather MicroServer? with industrial protocols, web and app monitoring options.
Innovative Weather Monitoring: Trusted by industry and government agencies worldwide. Professional, easy-to-use monitoring options. Customized sensor configurations. One-year warranty with personal technical support. Proven reliability, innovation, and brand recognition for over 45 years.
Graphs & GraphRAG - Essential Ingredients for GenAINeo4j
?
Knowledge graphs are emerging as useful and often necessary for bringing Enterprise GenAI projects from PoC into production. They make GenAI more dependable, transparent and secure across a wide variety of use cases. They are also helpful in GenAI application development: providing a human-navigable view of relevant knowledge that can be queried and visualised.
This talk will share up-to-date learnings from the evolving field of knowledge graphs; why more & more organisations are using knowledge graphs to achieve GenAI successes; and practical definitions, tools, and tips for getting started.
Sugarlab AI: How Much Does an XXX AI Porn Generator Cost in 2025Sugarlab AI
?
The cost of an XXX AI porn generator in 2025 varies depending on factors like AI sophistication, subscription plans, and additional expenses. Whether you're looking for a free AI porn video generator or a premium adult AI image generator, pricing ranges from basic tools to enterprise-level solutions. This article breaks down the costs, features, and what to expect from AI-driven adult content platforms.
SAP Automation with UiPath: SAP Test Automation - Part 5 of 8DianaGray10
?
This interesting webinar will show how UiPath can change how SAP Test Automation works. It will also show the main benefits and best ways to use UiPath with SAP.
Topics to be covered:
Learn about SAP test automation and why it's important for testing.
UiPath Overview: Learn how UiPath can make your SAP testing easier and faster.
Test Manager: Learn about the key advantages of automating your SAP tests, including increased accuracy and reduced time.
Best Practices: Get practical tips on how to use and improve test automation with UiPath.
Real-World Examples: Demonstration on how organizations have successfully leveraged UiPath for SAP test automation.
Testing Tools for Accessibility Enhancement Part II.pptxJulia Undeutsch
?
Automatic Testing Tools will help you get a first understanding of the accessibility of your website or web application. If you are new to accessibility, it will also help you learn more about the topic and the different issues that are occurring on the web when code is not properly written.
Achieving Extreme Scale with ScyllaDB: Tips & TradeoffsScyllaDB
?
Explore critical strategies ¨C and antipatterns ¨C for achieving low latency at extreme scale
If you¡¯re getting started with ScyllaDB, you¡¯re probably intrigued by its potential to achieve predictable low latency at extreme scale. But how do you ensure that you¡¯re maximizing that potential for your team¡¯s specific workloads and technical requirements?
This webinar offers practical advice for navigating the various decision points you¡¯ll face as you evaluate ScyllaDB for your project and move into production. We¡¯ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
Join us for an inside look at the lessons learned across thousands of real-world distributed database projects.
Building High-Impact Teams Beyond the Product Triad.pdfRafael Burity
?
The product triad is broken.
Not because of flawed frameworks, but because it rarely works as it should in practice.
When it becomes a battle of roles, it collapses.
It only works with clarity, maturity, and shared responsibility.
UiPath NY AI Series: Session 3: UiPath Autopilot for Everyone with Clipboard AIDianaGray10
?
? Embracing the Future: UiPath NY AI Series ¨C Session 3: UiPath Autopilot for Everyone with Clipboard AI
? Event Overview
This session will provide a deep dive into how UiPath Clipboard AI and Autopilot are reshaping automation, offering attendees a firsthand look at their capabilities, use cases, and real-world benefits. Whether you're a developer, business leader, or automation enthusiast, you'll gain valuable insights into leveraging these AI-driven tools to streamline operations and maximize productivity. ??
Presentation Session 2 -Context Grounding.pdfMukesh Kala
?
This series is your gateway to understanding the WHY, HOW, and WHAT of this revolutionary technology. Over six interesting sessions, we will learn about the amazing power of agentic automation. We will give you the information and skills you need to succeed in this new era.
The Future of Materials: Transitioning from Silicon to Alternative Metalsanupriti
?
This presentation delves into the emerging technologies poised to revolutionize the world of computing. From carbon nanotubes and graphene to quantum computing and DNA-based systems, discover the next-generation materials and innovations that could replace or complement traditional silicon chips. Explore the future of computing and the breakthroughs that are shaping a more efficient, faster, and sustainable technological landscape.
How AWS Encryption Key Options Impact Your Security and ComplianceChris Bingham
?
A rigorous approach to data encryption is increasingly essential for the security and compliance of all organizations, particularly here in Europe. However, all to often key management is neglected, and encryption itself ain¡¯t worth much if your encryption keys are poorly managed!
AWS KMS offers a range of encryption key management approaches, each with very different impacts on both your overall information security and crucially which laws and regulations they enable compliance with.
Join this mini-webinar to learn about the choices you need to make, including:
? Your options for one of the most important decisions you can make for your AWS security posture.
? How your AWS KMS configuration choices can fundamentally alter your organization's regulatory compliance.
? Which AWS KMS option is right for your organization.
How Air Coil Inductors Work By Cet TechnologyCET Technology
?
Air coil inductors are coils of conducting wire wound around a non-magnetic core, typically plastic, ceramic, or an air-filled form. These inductors don't rely on a magnetic core made of permeable materials like traditional inductors. The coil consists of a wire wound around a non-magnetic form, where air is the primary medium between the windings.
How Air Coil Inductors Work By Cet TechnologyCET Technology
?
The Ruby UCSC API @ISMB2012
1. The Ruby UCSC API:
accessing the UCSC Genome
Database using Ruby
Hiroyuki Mishima(1, Jan Aerts(2, Toshiaki Katayama(3,
Raoul J.P. Bonnal(4, Koh-ichiro Yoshiura(1
1)Nagasaki University, Japan;
2)Leuven University, Belgium;
3)DBCLS, ROIS, Japan;
4)Instituto Nazionale Genetica Molecolare, Italy
20th Annual International Conference on Integrate Systems for Molecular Biology
2012 July 15-17, @Long Beach, CA, USA
2. Background:
The University of California, Santa Cruz (UCSC) genome database is among the most used
sources of genomic annotation in human and other organisms. The database offers excellent
web-based graphical user interface (the UCSC genome browser) and several means for
programmatic queries. A simple application programming interface (API) in a scripting
language aimed at the biologist was however not yet available. Here, we present the Ruby
UCSC API, a library to access the UCSC genome database using Ruby.
Results:
The API is designed as a BioRuby plug-in (Biogem) and built on the ActiveRecord 3 framework
for the object-relational mapping, making writing SQL statements unnecessary. The current
version of the API supports databases of all organisms in the UCSC genome database including
human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.
The API uses the bin index¡ªif available¡ªwhen querying for genomic intervals. The API also
supports genomic sequence queries using locally downloaded *.2bit files that are not stored
in the official MySQL database. The API is implemented in pure Ruby and is therefore available
in different environments and with different Ruby interpreters (including JRuby).
Conclusions:
Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby
UCSC API will facilitate biologists to query the UCSC genome database programmatically. The
API is available through the RubyGem system. Source codes and documentations are available
at https://github.com/misshie/bioruby-ucsc-api/ 2
3. The UCSC genome database
? UCSC genome database is among the most used
sources of genomic annotation in human and
other organisms.
? Excellent web-based graphical user interface
(the UCSC genome browser) and several means
for programmatic queries.
? A simple application programming interface
(API) in a scripting language aimed at the
biologist was however not yet available.
? Supporting a large number of tables (>40,000) is
still challenging. 3
4. Ruby UCSC API
? A Ruby library to access
the UCSC genome
database.
? Designed as a Biogem -
BioRuby plug-in
? Built on the ActiveRecord
3 framework for an
object-relational mapping.
? Written in pure Ruby ¨C
supporting MRI Ruby Design structure of
1.9/1.8 and JRuby the Ruby UCSC API
4
5. Dynamic Table Class Definition
? The UCSC database is optimized to serve the genome
browser, resulting in a very large number of tables
? > 41,840 tables as MySQL *.MYD files
? Database components are updated frequently.
? Ruby UCSC API adopts dynamic class definition to
handle many table classes.
? When a table class referred for the first time, the API
prefetch fields of the table to detect a table type and
define appropriate table class. Additionally, this lazy
evaluation of class definition makes API initialization
much faster.
5
6. Availability and Installation
Installation via RubyGems
$ gem install bio-ucsc-api
GitHub
https://github.com/misshie/bioruby-ucsc-api
Support Forum
http://rubyucscapi.userecho.com/
RubyGems.org
https://rubygems.org/gems/bio-ucsc-api
6
7. Sample Codes and Features
require 'bio-ucsc¡®
Bio::Ucsc::Hg19.connect
result =
Bio::Ucsc::Hg19::Snp131.
find_by_name("rs56289060")
puts result.chrom # => "chr1"
? Supporting all organisms and at least newest
assemblies
? Supporting UCSC¡¯s official MySQL server and local
mirror MySQL servers
? ActiveRecord¡¯s object-relation mapping 7
8. region = "chr17:7,579,614-7,579,700"
condition =
Bio::Ucsc::Hg19::Snp131.
with_interval(region).select(:name)
puts condition.to_sql
SELECT name FROM `snp131`
WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0)
AND ( (chromStart BETWEEN 7579613 AND 7579700)
OR (chromEnd BETWEEN 7579613 AND 7579700)
OR (chromStart <= 7579613 AND
chromEND >= 7579700) ));
? Generating complex SQL statements using relations
? The bin index is, if available, used to accelerate queries.
8
9. # declaration of the table association
Ucsc::Hg19::KnownGene.class_eval do
has_one :knownToEnsembl, {:primary_key => :name,
:foreign_key => :name}
end
# reference to an associated field
puts Ucsc::Hg19::KnownGene.first.name
# => ¡°uc001aaa3¡±
puts Ucsc::Hg19::KnownGene.first.knownToEnsembl.value
# => "ENST00000456328"
? The user can define table associations.
? Associated tables can be accessed like fields of the
table.
9
10. 1: # load a locally-stored sequence file,
and extract partial seqence
2: seq = Ucsc::File::Twobit.open("hg19.2bit")
3: puts seq.subseq("chr1:9990-10009")
# => "NNNNNNNNNNNTAACCCTAA"
? In the UCSC genome database, genomic sequences are
not stored in the MySQL databases but in *.2bit files.
? Reference sequence objects are generated by the
File::Twobit.open class methods, and sequences
can be retrieved by the File::Twobit#subseq
method.
10
12. Current Limitations
? Table associations are not defined automatically.
? For some tables including subsets of the
ENCODE tables, the actual data are not stored in
the MySQL database itself but are stored as
references to BigWig, BigBed and BAM files. To
date, the Ruby UCSC API does not support them
yet. Instead, a Biogem, ¡°bio-samtools¡±, suppots
BAM file handlings.
12
13. Conclusions
? UCSC¡¯s official executables and C libraries are
the most comprehensive and fastest API for the
UCSC genome database.
? However, APIs for scripting languages still have
significant advantages for the user because
their concern is not only runtime speed but also
total time from programming to results.
? The Ruby UCSC API can therefore have a
significant impact in the field.
13