際際滷

際際滷Share a Scribd company logo
MPDB - Integrated system for storage
and analysis of metabolomic data
Design and implementation of the
data acquisition and analysis
pipeline
Alexander Raskind, SFRES MTU
Omics data availability
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Transcriptomics data:
ArrayExpress  3670 experiments,
109666 hybridizations
http://www.ebi.ac.uk/microarray-as/aer/
Proteomics data:
PRIDE  3,537 Experiments
645,869 Identified Proteins
http://www.ebi.ac.uk/microarray-as/aer/
Metabolomics data:
MMCD  20.306 compounds
http://mmcd.nmrfam.wisc.edu/
Human Metabolome Database 
2500 compounds
http://www.hmdb.ca/
Shifting research paradigm
genome.uiowa.edu
http://www.shimadzu.com
Targeted analysis High-throughput analysis
Populus as model system
 Wide ecological range
 Small genome relative to other trees
 Relatively easy transformation and cloning
 Belongs to Salicaceae  Willow family,
produces large amount of phenolic
compounds that may influence carbon
sequestration
Project rationale
 Affordable equipment generates limited
amount of metabolomic data with modest
quality
 Proper information storage and maximal
extraction of useful information are essential
 Free open source laboratory information
system tailored to metabolomics workflow
would benefit to a large scientific community
System requirements
 Easy access to large arrays of analytical
results and biological metadata
 Tools for data analysis
 Addition of analysis modules
 Accommodation of other types of analytical
data
 USER FRIENDLY
Analysis workflow
Major analytical problems
 Chemical complexity of the sample
o human metabolome - 2500 metabolites, plants  much more
 Wide dynamic range of response
o difference between most and least abundant components may be more
than 10,000
 Biological variation
 Matrix effects
o Interactions between sample componets leading to shifts in retention time
and sensitivity of detection comparative to pure compounds
 Instrument effects
o Shifting retention time (column wearing out and maintenance)
o Changes in sensitivity
Data analysis pipeline
 Raw data cleanup, peak detection,
deconvolution and quantification
 Compound identification (library search)
 Export of analysis results and biological
metadata to the database
 Peak alignment and normalization
 Final data analysis
System Outline
Analyzer-Pro
Result (XML format)
MP-align
GC/MS or LC/MS raw data
MPDB
Offline
Online
Data analysis
Biological
information
Compound identification
 NIST 2002 database for GCMS (MS only,
~140,000 entries)
 In-house database of essential metabolites
(MS and retention time, ~200 entries)
Why we need alignment
Single batch Multiple batches
Spectra similarity
Alignment algorithm
Peak
list
RI
MS
Grou
p
Consist
ency
Aligned
groups
Signal normalization
Raw data Normalized to TIC
User interface - tasks
 Data entry
 New analysis
 Review analysis
 Quality control
 Help
Data set definition
Sample groups review and annotation
Alignment results
Data export
Data sorting and filtering
Data assessment and analysis
 Data for individual compound groups
 Data for individual samples and compounds
 Principal component analysis
 Clustering of samples and compounds
 Graphical maps of compound ratios
Individual compound group data
Mass spectral data for the group
Individual sample and peak details
PCA
Clustering
Compound ratios
Quality control
Sample analysis  effects of nitrogen
stress on the Populus leaf metabolism
 Plants grown hydroponically
 N-stress for 8 weeks
 Samples taken from leaves at different
developmental stages (lamina and mid-vien)
 Metabolites fractionated by SPE
 Hydrophylic fractions additionally analyzed at 1:20
dilution
 Fractions were also subjected to glucosidase
hydrolysis and LPE
 3-5 biological and 1-2 technical replicas
Leaf hydrophilic fraction
 Up-regulated by N-stress:
o Galacturonic acid (X7), D-Arabinonate,
o Turanose, Syringin
o Ribose(?), methyl-Galactoside, 3-Hydroxy-3-
methylglutaric acid (HMGA), D-(-)-3-
Phosphoglyceric acid
Leaf hydrophilic fraction
 Down-regulated by N-stress:
o Most of free aminoacids and polyamines below
detection level or strongly reduced. Also some
sugars and polyols, but not clearly identified)
o Small organic acids (fumaric, succinic, threonic,
citric, malic, oxaloacetic)
o Sugar phosphates (glucose, fructose)
o Xylose, melibiose, cellobiose
Acknowledgements
 Prof. Scott Harding
 Prof. Chung-Jui Tsai
 Dr. Changyu Hu
 Prof. Meir Edelman (WIS)

More Related Content

What's hot (20)

PPTX
Entrez databases
Hafiz Muhammad Zeeshan Raza
PPTX
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Dinesh Barupal
PPTX
Databases in Bioinformatics
Meghaj Mallick
PPTX
Proteins databases
Hafiz Muhammad Zeeshan Raza
PDF
Data Retrieval Systems
Saramita De Chakravarti
PPTX
Biological Database
Sombir Kashyap
PPTX
Pathways and genomes databases in bioinformatics
sarwat bashir
PPTX
Features of biological databases
Charu Sharma
PDF
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
CEDAR: Center for Expanded Data Annotation and Retrieval
PPTX
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P MAURYA
PDF
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
ASIS&T
PDF
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
marcosmartinezromero
PPTX
Biological data bioinformatics
AakifahAmreen
PPT
A guided SQL tour of bioinformatics databases
Yannick Pouliot
DOCX
Data retrieval tools
Vidya Kalaivani Rajkumar
PDF
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
CEDAR: Center for Expanded Data Annotation and Retrieval
PDF
Bioinformatics introduction
DrGopaSarma
PPTX
DAS game: how a programmer thinks
Rafael C. Jimenez
PPTX
Accessing Environmental Chemistry Data via Data Dashboards
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
PPTX
Gen bank databases
Hafiz Muhammad Zeeshan Raza
Entrez databases
Hafiz Muhammad Zeeshan Raza
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Dinesh Barupal
Databases in Bioinformatics
Meghaj Mallick
Proteins databases
Hafiz Muhammad Zeeshan Raza
Data Retrieval Systems
Saramita De Chakravarti
Biological Database
Sombir Kashyap
Pathways and genomes databases in bioinformatics
sarwat bashir
Features of biological databases
Charu Sharma
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
CEDAR: Center for Expanded Data Annotation and Retrieval
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P MAURYA
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
ASIS&T
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
marcosmartinezromero
Biological data bioinformatics
AakifahAmreen
A guided SQL tour of bioinformatics databases
Yannick Pouliot
Data retrieval tools
Vidya Kalaivani Rajkumar
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
CEDAR: Center for Expanded Data Annotation and Retrieval
Bioinformatics introduction
DrGopaSarma
DAS game: how a programmer thinks
Rafael C. Jimenez
Accessing Environmental Chemistry Data via Data Dashboards
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
Gen bank databases
Hafiz Muhammad Zeeshan Raza

Viewers also liked (6)

PDF
Harmful interupts
Richard Ashworth
PPTX
How to create more value from government open data
theODI
PDF
Opendata: Visi坦n estrat辿gica y aspectos t辿cnicos
Antonio S叩nchez Zaplana
PPT
CRtB - Locality Lyn Kesterton
HACThousing
PPT
Hact community led housing - may 2014
HACThousing
PDF
Visitas desde smartphones y tablets en webs de turismo y cultura
晦温珂温乙稼辿岳庄界温
Harmful interupts
Richard Ashworth
How to create more value from government open data
theODI
Opendata: Visi坦n estrat辿gica y aspectos t辿cnicos
Antonio S叩nchez Zaplana
CRtB - Locality Lyn Kesterton
HACThousing
Hact community led housing - may 2014
HACThousing
Visitas desde smartphones y tablets en webs de turismo y cultura
晦温珂温乙稼辿岳庄界温
Ad

Similar to MPDB Presentation (20)

PPTX
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
PDF
Cpgr services brochure 14 may 2013 - v 16
Reinhard Hiller
PPTX
Fostering Serendipity through Big Linked Data
Muhammad Saleem
PPTX
Accessing Environmental Chemistry Data via Data Dashboards
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
PPT
American Society for Mass Spectrometry Conference 2013
Dmitry Grapov
PDF
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
geraintduck
PPT
Integrative information management for systems biology
Neil Swainston
PPTX
Multi-omics infrastructure and data for R/Bioconductor
Levi Waldron
PPTX
Cheminformatics approaches to support chemical identification delivered via t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
PPT
Cncp 2010
ygc
PPTX
Consensus ranking and fragmentation prediction for identification of unknowns...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
PDF
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
PPTX
Qi liu 08.08.2014
Hyun Wong Choi
PDF
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
ChemAxon
PDF
An Overview to Protein bioinformatics
Joel Ricci-L坦pez
PDF
Investigating plant systems using data integration and network analysis
Catherine Canevet
PPTX
Applications of the US EPAs CompTox chemicals dashboard to support structure...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
PPT
Biological databases
Sarfaraz Nasri
PPTX
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
PDF
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
VHIR Vall dHebron Institut de Recerca
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
Cpgr services brochure 14 may 2013 - v 16
Reinhard Hiller
Fostering Serendipity through Big Linked Data
Muhammad Saleem
Accessing Environmental Chemistry Data via Data Dashboards
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
American Society for Mass Spectrometry Conference 2013
Dmitry Grapov
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
geraintduck
Integrative information management for systems biology
Neil Swainston
Multi-omics infrastructure and data for R/Bioconductor
Levi Waldron
Cheminformatics approaches to support chemical identification delivered via t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
Cncp 2010
ygc
Consensus ranking and fragmentation prediction for identification of unknowns...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
Qi liu 08.08.2014
Hyun Wong Choi
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
ChemAxon
An Overview to Protein bioinformatics
Joel Ricci-L坦pez
Investigating plant systems using data integration and network analysis
Catherine Canevet
Applications of the US EPAs CompTox chemicals dashboard to support structure...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
Biological databases
Sarfaraz Nasri
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
VHIR Vall dHebron Institut de Recerca
Ad

MPDB Presentation

  • 1. MPDB - Integrated system for storage and analysis of metabolomic data Design and implementation of the data acquisition and analysis pipeline Alexander Raskind, SFRES MTU
  • 2. Omics data availability http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html Transcriptomics data: ArrayExpress 3670 experiments, 109666 hybridizations http://www.ebi.ac.uk/microarray-as/aer/ Proteomics data: PRIDE 3,537 Experiments 645,869 Identified Proteins http://www.ebi.ac.uk/microarray-as/aer/ Metabolomics data: MMCD 20.306 compounds http://mmcd.nmrfam.wisc.edu/ Human Metabolome Database 2500 compounds http://www.hmdb.ca/
  • 4. Populus as model system Wide ecological range Small genome relative to other trees Relatively easy transformation and cloning Belongs to Salicaceae Willow family, produces large amount of phenolic compounds that may influence carbon sequestration
  • 5. Project rationale Affordable equipment generates limited amount of metabolomic data with modest quality Proper information storage and maximal extraction of useful information are essential Free open source laboratory information system tailored to metabolomics workflow would benefit to a large scientific community
  • 6. System requirements Easy access to large arrays of analytical results and biological metadata Tools for data analysis Addition of analysis modules Accommodation of other types of analytical data USER FRIENDLY
  • 8. Major analytical problems Chemical complexity of the sample o human metabolome - 2500 metabolites, plants much more Wide dynamic range of response o difference between most and least abundant components may be more than 10,000 Biological variation Matrix effects o Interactions between sample componets leading to shifts in retention time and sensitivity of detection comparative to pure compounds Instrument effects o Shifting retention time (column wearing out and maintenance) o Changes in sensitivity
  • 9. Data analysis pipeline Raw data cleanup, peak detection, deconvolution and quantification Compound identification (library search) Export of analysis results and biological metadata to the database Peak alignment and normalization Final data analysis
  • 10. System Outline Analyzer-Pro Result (XML format) MP-align GC/MS or LC/MS raw data MPDB Offline Online Data analysis Biological information
  • 11. Compound identification NIST 2002 database for GCMS (MS only, ~140,000 entries) In-house database of essential metabolites (MS and retention time, ~200 entries)
  • 12. Why we need alignment Single batch Multiple batches
  • 15. Signal normalization Raw data Normalized to TIC
  • 16. User interface - tasks Data entry New analysis Review analysis Quality control Help
  • 18. Sample groups review and annotation
  • 21. Data sorting and filtering
  • 22. Data assessment and analysis Data for individual compound groups Data for individual samples and compounds Principal component analysis Clustering of samples and compounds Graphical maps of compound ratios
  • 24. Mass spectral data for the group
  • 25. Individual sample and peak details
  • 26. PCA
  • 30. Sample analysis effects of nitrogen stress on the Populus leaf metabolism Plants grown hydroponically N-stress for 8 weeks Samples taken from leaves at different developmental stages (lamina and mid-vien) Metabolites fractionated by SPE Hydrophylic fractions additionally analyzed at 1:20 dilution Fractions were also subjected to glucosidase hydrolysis and LPE 3-5 biological and 1-2 technical replicas
  • 31. Leaf hydrophilic fraction Up-regulated by N-stress: o Galacturonic acid (X7), D-Arabinonate, o Turanose, Syringin o Ribose(?), methyl-Galactoside, 3-Hydroxy-3- methylglutaric acid (HMGA), D-(-)-3- Phosphoglyceric acid
  • 32. Leaf hydrophilic fraction Down-regulated by N-stress: o Most of free aminoacids and polyamines below detection level or strongly reduced. Also some sugars and polyols, but not clearly identified) o Small organic acids (fumaric, succinic, threonic, citric, malic, oxaloacetic) o Sugar phosphates (glucose, fructose) o Xylose, melibiose, cellobiose
  • 33. Acknowledgements Prof. Scott Harding Prof. Chung-Jui Tsai Dr. Changyu Hu Prof. Meir Edelman (WIS)