際際滷

際際滷Share a Scribd company logo
Structure Generation,
Metabolite Space, and
Metabolite-Likeness

Julio E. Peironcely  @peyron
Juliopeironcely.com
PhD student at Leiden University and TNO
Metabolomics

   the quantitative and qualitative
      analysis of all metabolites in
     samples of cells, body fluids,
                       tissues, etc.


                  Julio E. Peironcely
Metabolomics

             Experi-                                                                 Biological
Biological                        Sample      Data       Data pre-         Data
             mental    Sampling                                                        inter-
question                        preparation acquisition processing        analysis
             design                                                                  pretation


                                                                 Metabolites




                                                                               Relevant
                                                                            biomolecules/
                                                                List of
                                      Samples     Raw data                   connectivities
                 Protocol                                       peaks/
                                                                                  &
                                                                biomolecules
                                                                                Models




                                                 Julio E. Peironcely
Metabolomics

             Experi-                                                                 Biological
Biological                        Sample      Data       Data pre-         Data
             mental    Sampling                                                        inter-
question                        preparation acquisition processing        analysis
             design                                                                  pretation


                                                                 Metabolites




                                                                               Relevant
                                                                            biomolecules/
                                                                List of
                                      Samples     Raw data                   connectivities
                 Protocol                                       peaks/
                                                                                  &
                                                                biomolecules
                                                                                Models




                                                 Julio E. Peironcely
De-novo identification
We have

          Elemental Composition

          Fragments (sometimes)

      Experimental Information

                 Julio E. Peironcely
We want

    List Of Candidate Structures

           As Short As Possible

    Good Structure Is In The List

                 Julio E. Peironcely
We need

             Structure Generator

           Keep only metabolites

  Use experimental information to
                 filter molecules

                  Julio E. Peironcely
Elemental
Composition




              Julio E. Peironcely
Elemental
Composition




      Structure
     Generation




                  Julio E. Peironcely
Elemental
Composition




      Structure
     Generation




              Molecules

                    Julio E. Peironcely
Structure Generator
                 Elemental	
 
                                         Fragments	
 
                  Formula	
 

                    Generate	
 




                           Candidate	
 
                           Structures	
 
In collaboration with Jean-Loup Faulon, Evry University
                                           Julio E. Peironcely
Structure Generator
                 Elemental	
 
                                          Fragments	
 
                  Formula	
 

                    Generate	
 
                      Keep	
 Molecules	
 if	
 
                         Canonical	
 
                       Augmenta:on	
 

                           Candidate	
 
                           Structures	
 
In collaboration with Jean-Loup Faulon, Evry University
                                            Julio E. Peironcely
Structure Generator                            Adding bonds




In collaboration with Jean-Loup Faulon, Evry University
                                           Julio E. Peironcely
Structure Generator                                                 Isomorphism

    Isomorphic class                                       Isomorphic class
    triangle + 1 edge                                    3-edge chain
              1                   1                             1                   1
                                                   1        2           3       2                   3    1
      2               3       2               3
                          1                            3                    1                   2            3
                               4          2                     4                4
          4                                                                                 1
                              3                                     2           3                   4
                  2                           4
                                               3                                    2                3
                      4           2                                     4

                                      4                                                 4




In collaboration with Jean-Loup Faulon, Evry University
                                                            Julio E. Peironcely
Structure Generator                                                 Isomorphism

    Isomorphic class                                       Isomorphic class
    triangle + 1 edge                                    3-edge chain
              1                   1                             1                   1
                                                   1        2           3       2                   3    1
      2               3       2               3
                          1                            3                    1                   2            3
                               4          2                     4                4
          4                                                                                 1
                              3                                     2           3                   4
                  2                           4
                                               3                                    2                3
                      4           2                                     4

                                      4                                                 4




    Output	
 ONLY	
 orange	
 graphs	
 
In collaboration with Jean-Loup Faulon, Evry University
                                                            Julio E. Peironcely
Structure Generator                                            Canonical Labeling
          1                   1                                1                    1
                                               1           2            3       2                   3    1
  2               3       2               3
                      1                            3                        1                   2            3
                           4          2                        4                 4
      4                                                                                     1
                          3                                         2           3                   4
              2                           4
                                           3                                        2                3
                  4           2                                         4

                                  4                                                     4




                  	
 	
 	
 	
 	
 	
 	
 	
 	
 Canonizer	
 	
 	
 (Nauty)	
 

      (1,2) (1,3) (1,4)                                            (1,2) (1,3) (2,4)
      (2,3)


                                                            Julio E. Peironcely
Only 1 canonical
  labeling in each
isomorphic class
Use canonizer to                                                                       1


remove duplicates after                                                          2         3

                                                                                            5
                                                                                     4
each extension        1                                                               (1,2)

                                                  2        3

                                                   4       5
                         1                                                                                 1
                                                  (1,2)(1,3)
                 2               3                                                                  2           3

                  4       5                                                                              4       5
                 (1,2)(1,3)(1,4)                                                                        (1,2)(1,3)
                                                                                                        (2,3)
         1                   1                1                     1

 2                   2               3   2            3        2         3                          1
             3

             5                  5          4        5                     5
     4                 4                                         4                              2          3
                     (1,2)(1,3)(1,4)     (1,2)(1,3)(1,4)       (1,2)(1,3)(1,4)
                                         (2,3)                                                   4       5
                     (3,4)                                     (4,5)
                                                                                                (1,2)(1,3)


                             X
                                                                                                (2,3)(2,4)
Canonical Augmentation



              A canonical object

   augmented in a canonical way

     produces a canonical object

                   Julio E. Peironcely
Check For Canonical Augmentation



                    Keep object if

             a canonical deletion

 takes you to the canonical father

                    Julio E. Peironcely
Accept only canonically                                                                   1

                                                                                    2         3

augmented graphs                                                                        4
                                                                                         (1,2)
                                                                                               5
                                                          1

                                                  2           3

                                                   4       5
                         1                                                                                    1
                                                  (1,2)(1,3)
                 2               3                                                                     2           3

                  4       5                                                                                 4       5
                 (1,2)(1,3)(1,4)                                                                           (1,2)(1,3)
                                                                                                           (2,3)
         1                   1                1                        1

 2                   2               3   2            3           2         3                          1
             3

             5                  5          4        5                        5
     4                 4                                            4                              2          3
                     (1,2)(1,3)(1,4)     (1,2)(1,3)(1,4)          (1,2)(1,3)(1,4)
                                         (2,3)                                                      4       5
                     (3,4)                                        (4,5)
                                                                                                   (1,2)(1,3)



                             X
                                                                                                   (2,3)(2,4)



                                                                                          X
Structure Generator Results                                                MOLGEN
                                                                           same # of
                                                                           molecules




                                                                       p-Cresol
                  Glycine   Phenylalanine Malic acid   D-Cysteine
                                                                        sulfate
     Elemental
                  C2H5NO2     C9H11NO2     C4H6O5      C3H7NO2S        C7H8O3S
    Composition
     # Output
                     84      277,810,163     8,070        3,838        10,203,389
     Molecules

                      6        4,037,499      1,601        100             19,940
    1 Fragment
                                  93,137                                      948

    2 Fragments                     584

    3 Fragments                     278



In collaboration with Jean-Loup Faulon, Evry University
                                                 Julio E. Peironcely
Lots of candidates
         structures
We are looking for
      metabolites
Elemental
Composition




      Structure       Metabolite
     Generation       Likeness




              Molecules

                    Julio E. Peironcely
Elemental
Composition
                                    Metabolites




      Structure       Metabolite
     Generation       Likeness




              Molecules

                    Julio E. Peironcely
How do metabolites
                     look like?
Understanding and Classifying Metabolite Space and Metabolite-Likeness
Julio E. Peironcely et al. PLoS One (in press)
HMDB          ZINC
 8K           21M



       Julio E. Peironcely
metabolites   non metabolites

      Water Solubility
            MW
         C Atoms
     Struc. Complexity
            PSA


               Julio E. Peironcely
PCA




      Julio E. Peironcely
PCA
Not so different
Decision Tree




                Julio E. Peironcely
Elemental
Composition
                                    Metabolites




      Structure       Metabolite
     Generation       Likeness




              Molecules

                    Julio E. Peironcely
Metabolite-likeness
Representation             + Classification
   HMDB            ZINC
    8K             21M


       Atom Counts

   Physicochemical desc.            Support Vector
                                    Machines (SVM)
     MDL Public Keys
                                 Random Forest (RF)
          FCFP_4
                                   Na誰ve Bayes (NB)
          ECFP_4




                             Julio E. Peironcely
Metabolite-likeness         HMDB
                             8K
                                                ZINC
                                                21M


                               Standardization


      Atom Counts            Diversity Selection
  Physicochemical desc.
    MDL Public Keys
         FCFP_4
         ECFP_4




                          Julio E. Peironcely
Metabolite-likeness           HMDB
                               8K
                                                  ZINC
                                                  21M


                                 Standardization


      Atom Counts              Diversity Selection
  Physicochemical desc.
    MDL Public Keys
         FCFP_4           Training Set              Test Set
         ECFP_4            532 + 532              6.4K + 6.4K




                            Julio E. Peironcely
Metabolite-likeness               HMDB
                                   8K
                                                      ZINC
                                                      21M


                                      Standardization


      Atom Counts                  Diversity Selection
  Physicochemical desc.
    MDL Public Keys
         FCFP_4             Training Set                Test Set
         ECFP_4              532 + 532                6.4K + 6.4K

                            5-fold CV

                          SVM    RF      BC




                                Julio E. Peironcely
Metabolite-likeness        HMDB
                            8K
                                               ZINC
                                               21M


                               Standardization


                            Diversity Selection
   3 classifiers
         X
                      Training Set               Test Set
  5 descriptions       532 + 532               6.4K + 6.4K

                      5-fold CV                Metabolite
                                                likeness
                   SVM    RF      BC




                         Julio E. Peironcely
Metabolite-likeness                          HMDB
                                              8K
                                                                 ZINC
                                                                 21M


Best = RF  MDLPublicKeys                        Standardization

Sensitivity   Specificity    AUC
                                              Diversity Selection
 99.84%        87.52%       99.20%

                                       Training Set                Test Set
      Bad BC  P_desc                   532 + 532                6.4K + 6.4K

Sensitivity   Specificity    AUC       5-fold CV                 Metabolite
                                                                  likeness
                                     SVM    RF      BC
 42.51%        86.56%       61.57%




                                           Julio E. Peironcely
Metabolite-likeness, external
validation
              HMDB
            External          DrugBank          ChEMBL
          validation set


                                          Random Selection



                           Standardization


                             Metabolite
                              likeness




                                    Julio E. Peironcely
Metabolite-likeness, external
validation




                     Julio E. Peironcely
Structure generation, metabolite space, and metabolite likeness
Met-likeness + structure generation
(malic acid) 8K

                                          100%

57%          77%




                    Julio E. Peironcely
Met-likeness + structure generation
(methylhistamine) 260K

                                          71%
     46%




                    Julio E. Peironcely
What else do we know
about our molecules?
Molecule   Minimized_Energy    ALogP   Index


Phenylalanine              0.1100             -1.605   5142
Molecule              Minimized_Energy    ALogP   Index


C9H11NO2
                                          0.1100             -1.605   5142




      Structure
     Generation




                  277 M

                               Julio E. Peironcely
Molecule              Minimized_Energy    ALogP   Index

C9H11NO2
                                          0.1100             -1.605   5142
              99%



      Structure
     Generation
                                    44%



            41 K


                               Julio E. Peironcely
Molecule              Minimized_Energy    ALogP   Index

C9H11NO2
   E < 10                                0.1100             -1.605   5142




      Structure
     Generation
                                   40%


              8K



                              Julio E. Peironcely
Molecule              Minimized_Energy    ALogP   Index

C9H11NO2
    E < 10                                0.1100             -1.605   5142


 ALogP < -1



       Structure
      Generation
                                    76%

               31




                               Julio E. Peironcely
Conclusions
   Met-Likeness prediction is good,
                 interpretation not

               Local models needed

 Structure Generator + Met-Likeness
         + other constraints = Met Id
                       improvement

                    Julio E. Peironcely
Acknowledgements

  Leiden University              University of Cambridge
  Miguel Rojas-Cherto            Andreas Bender
  Piotr Kasper
  Michael van Vliet
  Theo Reijmers
  Rob Vreeken                    Evry University
  Ronnie van Doorn               Jean-Loup Faulon
  Thomas Hankemeier              Davide Fichera


  TNO Quality of Life
  Leon Coulier
  Albert Tas                     HMP University of
                                 Alberta
                                 David Wishart
                                 Ying (Edison) Dong


                        Julio E. Peironcely

More Related Content

Structure generation, metabolite space, and metabolite likeness

  • 1. Structure Generation, Metabolite Space, and Metabolite-Likeness Julio E. Peironcely @peyron Juliopeironcely.com PhD student at Leiden University and TNO
  • 2. Metabolomics the quantitative and qualitative analysis of all metabolites in samples of cells, body fluids, tissues, etc. Julio E. Peironcely
  • 3. Metabolomics Experi- Biological Biological Sample Data Data pre- Data mental Sampling inter- question preparation acquisition processing analysis design pretation Metabolites Relevant biomolecules/ List of Samples Raw data connectivities Protocol peaks/ & biomolecules Models Julio E. Peironcely
  • 4. Metabolomics Experi- Biological Biological Sample Data Data pre- Data mental Sampling inter- question preparation acquisition processing analysis design pretation Metabolites Relevant biomolecules/ List of Samples Raw data connectivities Protocol peaks/ & biomolecules Models Julio E. Peironcely
  • 6. We have Elemental Composition Fragments (sometimes) Experimental Information Julio E. Peironcely
  • 7. We want List Of Candidate Structures As Short As Possible Good Structure Is In The List Julio E. Peironcely
  • 8. We need Structure Generator Keep only metabolites Use experimental information to filter molecules Julio E. Peironcely
  • 9. Elemental Composition Julio E. Peironcely
  • 10. Elemental Composition Structure Generation Julio E. Peironcely
  • 11. Elemental Composition Structure Generation Molecules Julio E. Peironcely
  • 12. Structure Generator Elemental Fragments Formula Generate Candidate Structures In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely
  • 13. Structure Generator Elemental Fragments Formula Generate Keep Molecules if Canonical Augmenta:on Candidate Structures In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely
  • 14. Structure Generator Adding bonds In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely
  • 15. Structure Generator Isomorphism Isomorphic class Isomorphic class triangle + 1 edge 3-edge chain 1 1 1 1 1 2 3 2 3 1 2 3 2 3 1 3 1 2 3 4 2 4 4 4 1 3 2 3 4 2 4 3 2 3 4 2 4 4 4 In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely
  • 16. Structure Generator Isomorphism Isomorphic class Isomorphic class triangle + 1 edge 3-edge chain 1 1 1 1 1 2 3 2 3 1 2 3 2 3 1 3 1 2 3 4 2 4 4 4 1 3 2 3 4 2 4 3 2 3 4 2 4 4 4 Output ONLY orange graphs In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely
  • 17. Structure Generator Canonical Labeling 1 1 1 1 1 2 3 2 3 1 2 3 2 3 1 3 1 2 3 4 2 4 4 4 1 3 2 3 4 2 4 3 2 3 4 2 4 4 4 Canonizer (Nauty) (1,2) (1,3) (1,4) (1,2) (1,3) (2,4) (2,3) Julio E. Peironcely
  • 18. Only 1 canonical labeling in each isomorphic class
  • 19. Use canonizer to 1 remove duplicates after 2 3 5 4 each extension 1 (1,2) 2 3 4 5 1 1 (1,2)(1,3) 2 3 2 3 4 5 4 5 (1,2)(1,3)(1,4) (1,2)(1,3) (2,3) 1 1 1 1 2 2 3 2 3 2 3 1 3 5 5 4 5 5 4 4 4 2 3 (1,2)(1,3)(1,4) (1,2)(1,3)(1,4) (1,2)(1,3)(1,4) (2,3) 4 5 (3,4) (4,5) (1,2)(1,3) X (2,3)(2,4)
  • 20. Canonical Augmentation A canonical object augmented in a canonical way produces a canonical object Julio E. Peironcely
  • 21. Check For Canonical Augmentation Keep object if a canonical deletion takes you to the canonical father Julio E. Peironcely
  • 22. Accept only canonically 1 2 3 augmented graphs 4 (1,2) 5 1 2 3 4 5 1 1 (1,2)(1,3) 2 3 2 3 4 5 4 5 (1,2)(1,3)(1,4) (1,2)(1,3) (2,3) 1 1 1 1 2 2 3 2 3 2 3 1 3 5 5 4 5 5 4 4 4 2 3 (1,2)(1,3)(1,4) (1,2)(1,3)(1,4) (1,2)(1,3)(1,4) (2,3) 4 5 (3,4) (4,5) (1,2)(1,3) X (2,3)(2,4) X
  • 23. Structure Generator Results MOLGEN same # of molecules p-Cresol Glycine Phenylalanine Malic acid D-Cysteine sulfate Elemental C2H5NO2 C9H11NO2 C4H6O5 C3H7NO2S C7H8O3S Composition # Output 84 277,810,163 8,070 3,838 10,203,389 Molecules 6 4,037,499 1,601 100 19,940 1 Fragment 93,137 948 2 Fragments 584 3 Fragments 278 In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely
  • 24. Lots of candidates structures
  • 25. We are looking for metabolites
  • 26. Elemental Composition Structure Metabolite Generation Likeness Molecules Julio E. Peironcely
  • 27. Elemental Composition Metabolites Structure Metabolite Generation Likeness Molecules Julio E. Peironcely
  • 28. How do metabolites look like? Understanding and Classifying Metabolite Space and Metabolite-Likeness Julio E. Peironcely et al. PLoS One (in press)
  • 29. HMDB ZINC 8K 21M Julio E. Peironcely
  • 30. metabolites non metabolites Water Solubility MW C Atoms Struc. Complexity PSA Julio E. Peironcely
  • 31. PCA Julio E. Peironcely
  • 32. PCA
  • 34. Decision Tree Julio E. Peironcely
  • 35. Elemental Composition Metabolites Structure Metabolite Generation Likeness Molecules Julio E. Peironcely
  • 36. Metabolite-likeness Representation + Classification HMDB ZINC 8K 21M Atom Counts Physicochemical desc. Support Vector Machines (SVM) MDL Public Keys Random Forest (RF) FCFP_4 Na誰ve Bayes (NB) ECFP_4 Julio E. Peironcely
  • 37. Metabolite-likeness HMDB 8K ZINC 21M Standardization Atom Counts Diversity Selection Physicochemical desc. MDL Public Keys FCFP_4 ECFP_4 Julio E. Peironcely
  • 38. Metabolite-likeness HMDB 8K ZINC 21M Standardization Atom Counts Diversity Selection Physicochemical desc. MDL Public Keys FCFP_4 Training Set Test Set ECFP_4 532 + 532 6.4K + 6.4K Julio E. Peironcely
  • 39. Metabolite-likeness HMDB 8K ZINC 21M Standardization Atom Counts Diversity Selection Physicochemical desc. MDL Public Keys FCFP_4 Training Set Test Set ECFP_4 532 + 532 6.4K + 6.4K 5-fold CV SVM RF BC Julio E. Peironcely
  • 40. Metabolite-likeness HMDB 8K ZINC 21M Standardization Diversity Selection 3 classifiers X Training Set Test Set 5 descriptions 532 + 532 6.4K + 6.4K 5-fold CV Metabolite likeness SVM RF BC Julio E. Peironcely
  • 41. Metabolite-likeness HMDB 8K ZINC 21M Best = RF MDLPublicKeys Standardization Sensitivity Specificity AUC Diversity Selection 99.84% 87.52% 99.20% Training Set Test Set Bad BC P_desc 532 + 532 6.4K + 6.4K Sensitivity Specificity AUC 5-fold CV Metabolite likeness SVM RF BC 42.51% 86.56% 61.57% Julio E. Peironcely
  • 42. Metabolite-likeness, external validation HMDB External DrugBank ChEMBL validation set Random Selection Standardization Metabolite likeness Julio E. Peironcely
  • 45. Met-likeness + structure generation (malic acid) 8K 100% 57% 77% Julio E. Peironcely
  • 46. Met-likeness + structure generation (methylhistamine) 260K 71% 46% Julio E. Peironcely
  • 47. What else do we know about our molecules?
  • 48. Molecule Minimized_Energy ALogP Index Phenylalanine 0.1100 -1.605 5142
  • 49. Molecule Minimized_Energy ALogP Index C9H11NO2 0.1100 -1.605 5142 Structure Generation 277 M Julio E. Peironcely
  • 50. Molecule Minimized_Energy ALogP Index C9H11NO2 0.1100 -1.605 5142 99% Structure Generation 44% 41 K Julio E. Peironcely
  • 51. Molecule Minimized_Energy ALogP Index C9H11NO2 E < 10 0.1100 -1.605 5142 Structure Generation 40% 8K Julio E. Peironcely
  • 52. Molecule Minimized_Energy ALogP Index C9H11NO2 E < 10 0.1100 -1.605 5142 ALogP < -1 Structure Generation 76% 31 Julio E. Peironcely
  • 53. Conclusions Met-Likeness prediction is good, interpretation not Local models needed Structure Generator + Met-Likeness + other constraints = Met Id improvement Julio E. Peironcely
  • 54. Acknowledgements Leiden University University of Cambridge Miguel Rojas-Cherto Andreas Bender Piotr Kasper Michael van Vliet Theo Reijmers Rob Vreeken Evry University Ronnie van Doorn Jean-Loup Faulon Thomas Hankemeier Davide Fichera TNO Quality of Life Leon Coulier Albert Tas HMP University of Alberta David Wishart Ying (Edison) Dong Julio E. Peironcely