ݺߣ

ݺߣShare a Scribd company logo
AllBio EU CodeFest 
/ 
| ? 
Phd Student @ 
Bioinformatics 
and Population Genomics 
Supervisor: 
Yannick Wurm | ? 
Before: 
bmpvieira.com/allbio14 
Bruno Vieira @bmpvieira 
@yannick__ 
? 2014 Bruno Vieira CC-BY 4.0
Some problems I faced 
during my research: 
Difficulty getting relevant descriptions 
and datasets from NCBI API using bio* libs 
For web projects, needed to implement 
the same functionality on browser and 
server 
Difficulty writing scalable, reproducible 
and complex bioinformatic pipelines
- Modular and universal bioinformatics 
Bionode.io 
Pipeable UNIX command line tools and 
JavaScript / Node.js APIs for bioinformatic 
analysis workflows on the server and browser. 
Collaborates with - Represent biological data on the web 
- Build data pipelines 
BioJS 
Dat 
Provides a streaming interface between every file 
format and data storage backend. "git for data" 
| ? | ? 
dat-data.com @maxogden @mafintosh
bionode.io (online shell) 
Examples 
BASH 
bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna 
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/ 
GCA_000188075.1_Si_gnG_genomic.fna.gz 
bionode-ncbi download sra arthropoda | bionode-sra 
bionode-ncbi download gff bacteria 
JavaScript 
var ncbi = require('bionode-ncbi') 
ncbi.urls('assembly', 'Solenopsis invicta'), gotData) 
function gotData(urls) { 
var genome = urls[0].genomic.fna 
download(genome) 
})
Difficulty getting relevant description and 
datasets from NCBI API using bio* libs 
Python example 
import xml.etree.ElementTree as ET 
from Bio import Entrez 
Entrez.email = "mail@bmpvieira.com" 
esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex") 
esearch_record = Entrez.read(esearch_handle) 
for id in esearch_record['IdList']: 
esummary_handle = Entrez.esummary(db="assembly", id=id) 
esummary_record = Entrez.read(esummary_handle) 
documentSummarySet = esummary_record['DocumentSummarySet'] 
document = documentSummarySet['DocumentSummary'][0] 
metadata_XML = document['Meta'].encode('utf-8') 
metadata = ET.fromstring('<root>' + metadata_XML + '</root>') 
for entry in Metadata[1]: 
print entry.text 
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG 
Solution: 
bionode-ncbi
Need to reimplement the same code on 
browser and server. 
Solution: JavaScript everywhere 
Afra 
SequenceServer 
GeneValidator 
BioJS 
Biodalliance 
is converting parsers to 
Bionode
Difficulty writing scalable, reproducible and 
complex bioinformatic pipelines. 
Solution: Node.js Streams everywhere 
var ncbi = require('bionode-ncbi') 
var tool = require('tool-stream') 
var through = require('through2') 
var fork1 = through.obj() 
var fork2 = through.obj() 
ncbi 
.search('sra', 'Solenopsis invicta') 
.pipe(fork1) 
.pipe(dat.reads) 
fork1 
.pipe(tool.extractProperty('expxml.Biosample.id')) 
.pipe(ncbi.search('biosample')) 
.pipe(dat.samples) 
fork1 
.pipe(tool.extractProperty('uid')) 
.pipe(ncbi.link('sra', 'pubmed'))
Benefit from other JS 
projects 
Dat BioJS NoFlo
AllBio and EU CodeFest 2014
AllBio and EU CodeFest 2014
Reusable, small and tested 
modules
Some users and Contributors: 
Dat 
Biodalliance 
BioJS 
Yeo Lab 
(UC San Diego) 
Michael Lovci 
Olga Botvinnik 
Afra 
GeneValidator 
Soon: 
DNADigest
Thanks! 
Acknowledgements: 
? 
? 
? 
? 
? 
? 
@yannick__ 
@maxogden 
@mafintosh 
@alanmrice 
@dasmoth 
@biodevops
Why Node.js / JavaScript 
applies well to Bioinformatics 
Streams 
Easy to write CLI wrappers 
for Streams 
Reusable, small and tested modules 
Same language everywhere (JavaScript) 
Package Manager that works ( NPM 
) 
Huge number modules ( 93327, 199/day 
) 
Use other JS projects ( Dat , BioJS , NoFlo 
) 
Possible to write 
Desktop GUI apps in JS
Module counts
Package Manager that works 
npm install bionode 
npm install bionode -g 
npm test 
npm start 
npm run test-browser 
npm run build-docs 
npm init 
npm publish 
Not only for JavaScript, C/C++ too: 
Node.js style C/C++ modules 
Native C/C++ running in Google V8

More Related Content

AllBio and EU CodeFest 2014

  • 1. AllBio EU CodeFest / | ? Phd Student @ Bioinformatics and Population Genomics Supervisor: Yannick Wurm | ? Before: bmpvieira.com/allbio14 Bruno Vieira @bmpvieira @yannick__ ? 2014 Bruno Vieira CC-BY 4.0
  • 2. Some problems I faced during my research: Difficulty getting relevant descriptions and datasets from NCBI API using bio* libs For web projects, needed to implement the same functionality on browser and server Difficulty writing scalable, reproducible and complex bioinformatic pipelines
  • 3. - Modular and universal bioinformatics Bionode.io Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser. Collaborates with - Represent biological data on the web - Build data pipelines BioJS Dat Provides a streaming interface between every file format and data storage backend. "git for data" | ? | ? dat-data.com @maxogden @mafintosh
  • 4. bionode.io (online shell) Examples BASH bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/ GCA_000188075.1_Si_gnG_genomic.fna.gz bionode-ncbi download sra arthropoda | bionode-sra bionode-ncbi download gff bacteria JavaScript var ncbi = require('bionode-ncbi') ncbi.urls('assembly', 'Solenopsis invicta'), gotData) function gotData(urls) { var genome = urls[0].genomic.fna download(genome) })
  • 5. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('<root>' + metadata_XML + '</root>') for entry in Metadata[1]: print entry.text ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG Solution: bionode-ncbi
  • 6. Need to reimplement the same code on browser and server. Solution: JavaScript everywhere Afra SequenceServer GeneValidator BioJS Biodalliance is converting parsers to Bionode
  • 7. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. Solution: Node.js Streams everywhere var ncbi = require('bionode-ncbi') var tool = require('tool-stream') var through = require('through2') var fork1 = through.obj() var fork2 = through.obj() ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads) fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples) fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed'))
  • 8. Benefit from other JS projects Dat BioJS NoFlo
  • 11. Reusable, small and tested modules
  • 12. Some users and Contributors: Dat Biodalliance BioJS Yeo Lab (UC San Diego) Michael Lovci Olga Botvinnik Afra GeneValidator Soon: DNADigest
  • 13. Thanks! Acknowledgements: ? ? ? ? ? ? @yannick__ @maxogden @mafintosh @alanmrice @dasmoth @biodevops
  • 14. Why Node.js / JavaScript applies well to Bioinformatics Streams Easy to write CLI wrappers for Streams Reusable, small and tested modules Same language everywhere (JavaScript) Package Manager that works ( NPM ) Huge number modules ( 93327, 199/day ) Use other JS projects ( Dat , BioJS , NoFlo ) Possible to write Desktop GUI apps in JS
  • 16. Package Manager that works npm install bionode npm install bionode -g npm test npm start npm run test-browser npm run build-docs npm init npm publish Not only for JavaScript, C/C++ too: Node.js style C/C++ modules Native C/C++ running in Google V8