The document discusses metabolite identification from mass spectrometry data. It describes how a structure generator works to generate candidate metabolite structures from an elemental composition. The generator adds bonds in all possible ways to create structures, then uses isomorphism and canonical labeling to remove duplicate structures within the same isomorphic class. This process generates a list of candidate metabolite structures for further analysis and filtering against experimental data.
1 of 54
Download to read offline
More Related Content
Structure generation, metabolite space, and metabolite likeness
2. Metabolomics
the quantitative and qualitative
analysis of all metabolites in
samples of cells, body fluids,
tissues, etc.
Julio E. Peironcely
3. Metabolomics
Experi- Biological
Biological Sample Data Data pre- Data
mental Sampling inter-
question preparation acquisition processing analysis
design pretation
Metabolites
Relevant
biomolecules/
List of
Samples Raw data connectivities
Protocol peaks/
&
biomolecules
Models
Julio E. Peironcely
4. Metabolomics
Experi- Biological
Biological Sample Data Data pre- Data
mental Sampling inter-
question preparation acquisition processing analysis
design pretation
Metabolites
Relevant
biomolecules/
List of
Samples Raw data connectivities
Protocol peaks/
&
biomolecules
Models
Julio E. Peironcely
12. Structure Generator
Elemental
Fragments
Formula
Generate
Candidate
Structures
In collaboration with Jean-Loup Faulon, Evry University
Julio E. Peironcely
13. Structure Generator
Elemental
Fragments
Formula
Generate
Keep
Molecules
if
Canonical
Augmenta:on
Candidate
Structures
In collaboration with Jean-Loup Faulon, Evry University
Julio E. Peironcely
14. Structure Generator Adding bonds
In collaboration with Jean-Loup Faulon, Evry University
Julio E. Peironcely
15. Structure Generator Isomorphism
Isomorphic class Isomorphic class
triangle + 1 edge 3-edge chain
1 1 1 1
1 2 3 2 3 1
2 3 2 3
1 3 1 2 3
4 2 4 4
4 1
3 2 3 4
2 4
3 2 3
4 2 4
4 4
In collaboration with Jean-Loup Faulon, Evry University
Julio E. Peironcely
16. Structure Generator Isomorphism
Isomorphic class Isomorphic class
triangle + 1 edge 3-edge chain
1 1 1 1
1 2 3 2 3 1
2 3 2 3
1 3 1 2 3
4 2 4 4
4 1
3 2 3 4
2 4
3 2 3
4 2 4
4 4
Output
ONLY
orange
graphs
In collaboration with Jean-Loup Faulon, Evry University
Julio E. Peironcely
26. Elemental
Composition
Structure Metabolite
Generation Likeness
Molecules
Julio E. Peironcely
27. Elemental
Composition
Metabolites
Structure Metabolite
Generation Likeness
Molecules
Julio E. Peironcely
28. How do metabolites
look like?
Understanding and Classifying Metabolite Space and Metabolite-Likeness
Julio E. Peironcely et al. PLoS One (in press)
35. Elemental
Composition
Metabolites
Structure Metabolite
Generation Likeness
Molecules
Julio E. Peironcely
36. Metabolite-likeness
Representation + Classification
HMDB ZINC
8K 21M
Atom Counts
Physicochemical desc. Support Vector
Machines (SVM)
MDL Public Keys
Random Forest (RF)
FCFP_4
Na誰ve Bayes (NB)
ECFP_4
Julio E. Peironcely
37. Metabolite-likeness HMDB
8K
ZINC
21M
Standardization
Atom Counts Diversity Selection
Physicochemical desc.
MDL Public Keys
FCFP_4
ECFP_4
Julio E. Peironcely
38. Metabolite-likeness HMDB
8K
ZINC
21M
Standardization
Atom Counts Diversity Selection
Physicochemical desc.
MDL Public Keys
FCFP_4 Training Set Test Set
ECFP_4 532 + 532 6.4K + 6.4K
Julio E. Peironcely
39. Metabolite-likeness HMDB
8K
ZINC
21M
Standardization
Atom Counts Diversity Selection
Physicochemical desc.
MDL Public Keys
FCFP_4 Training Set Test Set
ECFP_4 532 + 532 6.4K + 6.4K
5-fold CV
SVM RF BC
Julio E. Peironcely
40. Metabolite-likeness HMDB
8K
ZINC
21M
Standardization
Diversity Selection
3 classifiers
X
Training Set Test Set
5 descriptions 532 + 532 6.4K + 6.4K
5-fold CV Metabolite
likeness
SVM RF BC
Julio E. Peironcely
41. Metabolite-likeness HMDB
8K
ZINC
21M
Best = RF MDLPublicKeys Standardization
Sensitivity Specificity AUC
Diversity Selection
99.84% 87.52% 99.20%
Training Set Test Set
Bad BC P_desc 532 + 532 6.4K + 6.4K
Sensitivity Specificity AUC 5-fold CV Metabolite
likeness
SVM RF BC
42.51% 86.56% 61.57%
Julio E. Peironcely
48. Molecule Minimized_Energy ALogP Index
Phenylalanine 0.1100 -1.605 5142
49. Molecule Minimized_Energy ALogP Index
C9H11NO2
0.1100 -1.605 5142
Structure
Generation
277 M
Julio E. Peironcely
50. Molecule Minimized_Energy ALogP Index
C9H11NO2
0.1100 -1.605 5142
99%
Structure
Generation
44%
41 K
Julio E. Peironcely
51. Molecule Minimized_Energy ALogP Index
C9H11NO2
E < 10 0.1100 -1.605 5142
Structure
Generation
40%
8K
Julio E. Peironcely
52. Molecule Minimized_Energy ALogP Index
C9H11NO2
E < 10 0.1100 -1.605 5142
ALogP < -1
Structure
Generation
76%
31
Julio E. Peironcely
53. Conclusions
Met-Likeness prediction is good,
interpretation not
Local models needed
Structure Generator + Met-Likeness
+ other constraints = Met Id
improvement
Julio E. Peironcely
54. Acknowledgements
Leiden University University of Cambridge
Miguel Rojas-Cherto Andreas Bender
Piotr Kasper
Michael van Vliet
Theo Reijmers
Rob Vreeken Evry University
Ronnie van Doorn Jean-Loup Faulon
Thomas Hankemeier Davide Fichera
TNO Quality of Life
Leon Coulier
Albert Tas HMP University of
Alberta
David Wishart
Ying (Edison) Dong
Julio E. Peironcely