Lightning talk for TGAC - AllBio: Open Science & Reproducibility Best Practice Workshop (first 4 slides). Request talk at the end of EU CodeFest (all slides)
1 of 16
Download to read offline
More Related Content
AllBio and EU CodeFest 2014
1. AllBio EU CodeFest
/
| ?
Phd Student @
Bioinformatics
and Population Genomics
Supervisor:
Yannick Wurm | ?
Before:
bmpvieira.com/allbio14
Bruno Vieira @bmpvieira
@yannick__
? 2014 Bruno Vieira CC-BY 4.0
2. Some problems I faced
during my research:
Difficulty getting relevant descriptions
and datasets from NCBI API using bio* libs
For web projects, needed to implement
the same functionality on browser and
server
Difficulty writing scalable, reproducible
and complex bioinformatic pipelines
3. - Modular and universal bioinformatics
Bionode.io
Pipeable UNIX command line tools and
JavaScript / Node.js APIs for bioinformatic
analysis workflows on the server and browser.
Collaborates with - Represent biological data on the web
- Build data pipelines
BioJS
Dat
Provides a streaming interface between every file
format and data storage backend. "git for data"
| ? | ?
dat-data.com @maxogden @mafintosh
5. Difficulty getting relevant description and
datasets from NCBI API using bio* libs
Python example
import xml.etree.ElementTree as ET
from Bio import Entrez
Entrez.email = "mail@bmpvieira.com"
esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex")
esearch_record = Entrez.read(esearch_handle)
for id in esearch_record['IdList']:
esummary_handle = Entrez.esummary(db="assembly", id=id)
esummary_record = Entrez.read(esummary_handle)
documentSummarySet = esummary_record['DocumentSummarySet']
document = documentSummarySet['DocumentSummary'][0]
metadata_XML = document['Meta'].encode('utf-8')
metadata = ET.fromstring('<root>' + metadata_XML + '</root>')
for entry in Metadata[1]:
print entry.text
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG
Solution:
bionode-ncbi
6. Need to reimplement the same code on
browser and server.
Solution: JavaScript everywhere
Afra
SequenceServer
GeneValidator
BioJS
Biodalliance
is converting parsers to
Bionode
7. Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
Solution: Node.js Streams everywhere
var ncbi = require('bionode-ncbi')
var tool = require('tool-stream')
var through = require('through2')
var fork1 = through.obj()
var fork2 = through.obj()
ncbi
.search('sra', 'Solenopsis invicta')
.pipe(fork1)
.pipe(dat.reads)
fork1
.pipe(tool.extractProperty('expxml.Biosample.id'))
.pipe(ncbi.search('biosample'))
.pipe(dat.samples)
fork1
.pipe(tool.extractProperty('uid'))
.pipe(ncbi.link('sra', 'pubmed'))
14. Why Node.js / JavaScript
applies well to Bioinformatics
Streams
Easy to write CLI wrappers
for Streams
Reusable, small and tested modules
Same language everywhere (JavaScript)
Package Manager that works ( NPM
)
Huge number modules ( 93327, 199/day
)
Use other JS projects ( Dat , BioJS , NoFlo
)
Possible to write
Desktop GUI apps in JS
16. Package Manager that works
npm install bionode
npm install bionode -g
npm test
npm start
npm run test-browser
npm run build-docs
npm init
npm publish
Not only for JavaScript, C/C++ too:
Node.js style C/C++ modules
Native C/C++ running in Google V8