Talk given at the 9th International Conference on Chemical Structures, June 9th. How metabolomics and metabolite identification can be improved by chemoinformatics.
2. Metabolome
all low molecular weight molecules
(metabolites) in cells, body fluids,
tissues, etc.
Metabolomics
the quantitative and qualitative
analysis of all metabolites in
samples of cells, body fluids,
tissues, etc.
June 9th 2011
ICCS 2011
Julio E. Peironcely
3. Metabolomics
Genome
Transcripts
Proteins
Metabolites
Phenotype
June 9th 2011
ICCS 2011
Julio E. Peironcely
4. Metabolomics
Experi- Biological
Biological Sample Data Data pre- Data
mental Sampling inter-
question preparation acquisition processing analysis
design pretation
Metabolites
Relevant
biomolecules/
List of
Samples Raw data connectivities
Protocol peaks/
&
biomolecules
Models
June 9th 2011
ICCS 2011
Julio E. Peironcely
5. LC-MS and Metabolite Identification
RPLC-microTOF (Bruker)
LC
MS MS
peak = m/z@rt = metabolite
Kloet et al, Metabolomics 2011 June 9th 2011
ICCS 2011
Julio E. Peironcely
6. When all you have is a mass and
a window
List of
Elemental
Compositions
ChemSpider
MS
Mass + KEGG HMDB
Window
PubChem
List of Molecules
Measured Mass + Mass Window = Multiple Elemental Compositions
June 9th 2011
ICCS 2011
Julio E. Peironcely
7. n
When you have MS
Measured Mass + Mass Window + Fragments
=
Single Elemental Composition
ChemSpider
n
Mass + KEGG HMDB
MS Window
1 EC
fragments PubChem
List of Molecules
Use the expert system
June 9th 2011
ICCS 2011
Julio E. Peironcely
8. The expert system
Dr. Ronnie van Doorn Dr. Albert Tas
June 9th 2011
ICCS 2011
Julio E. Peironcely
9. Bottlenecks in Metabolite
Identification
Many metabolites in LC-MS not identified
HighRes MS can obtain 1 EC = many structures
No tools for automatic identification
Takes long for the expert to identify metabolites
Julio E. Peironcely
10. Challenges
Analytical methods
Software
Databases
Julio E. Peironcely
11. Our approach
Experimental 1
Data 1.1
1.1
1.2
1.2
1.1. 1.2. 1.2.
Processing 1.1.1 1 1.2.1 1 1.2.2 2
Data Trees
present Identity assignment
n
MS
Find Similar
Trees Database absent
Generate Elemental Formula Structure
+
Structures Fragments generator
Filter Metabolite-
Structures likeness
June 9th 2011
ICCS 2011
Julio E. Peironcely
12. Our approach
Experimental 1
Data 1.1
1.1
1.2
1.2
1.1. 1.2. 1.2.
Processing 1.1.1 1 1.2.1 1 1.2.2 2
Data Trees
present Identity assignment
n
MS
Find Similar
Trees Database absent
Generate Elemental Formula Structure
+
Structures Fragments generator
Filter Metabolite-
Structures likeness
June 9th 2011
ICCS 2011
Julio E. Peironcely
13. Our approach
Experimental 1
Data 1.1
1.1
1.2
1.2
1.1. 1.2. 1.2.
Processing 1.1.1 1 1.2.1 1 1.2.2 2
Data Trees
present Identity assignment
n
MS
Find Similar
Trees Database absent
Generate Elemental Formula Structure
+
Structures Fragments generator
Filter Metabolite-
Structures likeness
June 9th 2011
ICCS 2011
Julio E. Peironcely
14. Our approach
Experimental 1
Data 1.1
1.1
1.2
1.2
1.1. 1.2. 1.2.
Processing 1.1.1 1 1.2.1 1 1.2.2 2
Data Trees
present Identity assignment
n
MS
Find Similar
Trees Database absent
Generate Elemental Formula Structure
+
Structures Fragments generator
Filter Metabolite-
Structures likeness
June 9th 2011
ICCS 2011
Julio E. Peironcely
16. MEF: spectral to fragmentation tree
Experimental
Data
Processing
Data Trees
CML
Find Similar XCMS
Trees
CDK
Generate
Structures MEF
Filter
Structures
Rojas-Cherto et al. Bioinformatics (submitted)
June 9th 2011
ICCS 2011
Julio E. Peironcely
17. MEF: spectral to fragmentation tree
Experimental
full
Data Peak
scan MS Parent Ion Mass
EC
Processing
Data Trees Reaction
Noise
1 Neutral
Find Similar
Trees Fragments loss
Generate
Structures
MS2 1.1 1.2 1.3
1.1 1.2 1.3
Filter
Structures
MS3
1.1.1 1.2.1 1.2.2 1.2.3 1.3.1 1.3.2
1.1.1 1.2.1 1.2.2 1.2.3 1.3.1 1.3.2
Spectral Tree Fragmentation Tree
Rojas-Cherto et al. Bioinformatics (submitted)
June 9th 2011
ICCS 2011
Julio E. Peironcely
19. Fragmentation tree fingerprints
Experimental
1
Data
Processing
Data Trees 1.1
1.1
1.2
1.2
1.3
1.3
Find Similar 1.1. 1.2. 1.2. 1.2. 1.3. 1.3.
Trees 1.1.1 1 1.2.1 1 1.2.2 2 1.2.3 3 1.3.1 1 1.3.2 2
Generate
Structures
Filter
Structures
Poster 59
Miguel Rojas-Cherto, Leiden University
June 9th 2011
ICCS 2011
Julio E. Peironcely
20. Fragmentation tree fingerprints
resultsunknown Most similar trees MCS
Experimental
Data
Processing
Data Trees
Find Similar
Trees
Generate
Structures
Filter
Structures
Poster 59
Miguel Rojas-Cherto, Leiden University
June 9th 2011
ICCS 2011
Julio E. Peironcely
22. Structure Generator
Experimental
Data Elemental
Fragments
Formula
Processing
Data Trees
Find Similar Generate
Trees
Keep molecules if
Generate canonical
Structures augmentation CDK
Nauty
Filter
Structures
All non-duplicated
molecules
Poster 38
In collaboration with Jean-Loup Faulon, Evry University
June 9th 2011
ICCS 2011
Julio E. Peironcely
23. Structure Generator Results MOLGEN
same # of
molecules
Experimental
Data
Processing
Data Trees
p-Cresol
Glycine Phenylalanine Malic acid D-Cysteine
Find Similar sulfate
Trees Elemental
C2H5NO2 C9H11NO2 C4H6O5 C3H7NO2S C7H8O3S
Composition
Generate # Output
84 277,810,163 8,070 3,838 10,203,389
Structures Molecules
6 4,037,499 1,601 100 19,940
Filter 1 Fragment
Structures
93,137 948
2 Fragments 584
3 Fragments 278
Poster 38
In collaboration with Jean-Loup Faulon, Evry University
June 9th 2011
ICCS 2011
Julio E. Peironcely
25. Metabolite-likeness HMDB
8K
ZINC
21M
Experimental
Data Atom Counts
Physicochemical desc. Standardization
Processing MDL Public Keys
Data Trees FCFP_4
ECFP_4 Diversity Selection
Find Similar
Trees
Training Set Test Set
Generate 532 + 532 6.4K + 6.4K
Structures
5-fold CV Metabolite
Filter
likeness
Structures SVM RF BC
In collaboration with Andreas Bender, Cambridge Univ.
June 9th 2011
ICCS 2011
Julio E. Peironcely
26. Metabolite-likeness HMDB
8K
ZINC
21M
Experimental
Data Atom Counts
Physicochemical desc. Standardization
Processing MDL Public Keys
Data Trees FCFP_4
ECFP_4 Diversity Selection
Find Similar
Trees
Training Set Test Set
Generate 532 + 532 6.4K + 6.4K
Structures
5-fold CV Metabolite
Filter
likeness
Structures SVM RF BC
1st RF MDLPublicKeys 2nd RF ECFP_4
Sensitivity Specificity AUC Sensitivity Specificity AUC
99.84% 87.52% 99.20% 99.77% 86.36% 99%
In collaboration with Andreas Bender, Cambridge Univ.
June 9th 2011
ICCS 2011
Julio E. Peironcely
27. Metabolite-likeness, external
validation
Experimental
Data HMDB
External DrugBank ChEMBL
validation set
Processing
Data Trees
Random Selection
Find Similar
Trees
Generate Standardization
Structures
Filter Metabolite RF MDLPublicKeys
Structures
likeness RF ECFP_4
In collaboration with Andreas Bender, Cambridge Univ.
June 9th 2011
ICCS 2011
Julio E. Peironcely
28. Metabolite-likeness, external
validation
Experimental
Data
Processing
Data Trees
Find Similar
Trees
Generate
Structures
Filter
Structures
In collaboration with Andreas Bender, Cambridge Univ.
June 9th 2011
ICCS 2011
Julio E. Peironcely
30. www.MetiTree.nl
ICCS 2011
Group Database
Upload and organize your own trees
Visualize trees
Find similar trees
(to be added: Struct. Gen, Met-likeness)
June 9th 2011
ICCS 2011
Julio E. Peironcely
31. Conclusions
Chemoinformatics plays a crucial
role in the metabolite identification
pipeline
Now it is the time to challenge this
pipeline with real cases
Expert is still needed
June 9th 2011
ICCS 2011
Julio E. Peironcely
32. Acknowledgements TNO Quality of Life
Leon Coulier
Albert Tas
Leiden University
Miguel Rojas-Cherto
Piotr Kasper
Michael van Vliet University of Cambridge
Theo Reijmers Andreas Bender
Rob Vreeken
Ronnie van Doorn
Thomas Hankemeier Evry University
Jean-Loup Faulon
Davide Fichera
Wageningen University/ HMP University of
PRI Alberta
Egon Willighangen David Wishart
Justin van der Hooft Ying (Edison) Dong
Ric de Vos
Jacques Vervoort
June 9th 2011
ICCS 2011
Julio E. Peironcely