際際滷

際際滷Share a Scribd company logo
Tautomers - Advanced Databases for
in-silico Screening ?
Frank Oellien, 18th CIC Workshop 2004, Boppard
Overview
? Motivation for tautomers in screening data sets
? Tautomer enumeration approach
? Workflow
? Examples for `enhanced¨ Database Search
? Results for `enhanced¨ Database Search
? Summary and Future Tasks
Biological Relevance of Tautomers

Targets: Bacteria, Arthropods, and Parasites
Variance for physiological pH expected C not static values !
Ambiguities in Proteins and Ligands
Proteins (X-Ray structures)
? Flexibility (e.g. Gln, Asn)
? Ionisation states (e.g. Glu, Asp)
? Tautomerism (e.g. His)

Ligands (Compound Libraries)
?
?
?
?

Conformations
Ionization states
Stereo centers
Tautomers
Software: Virtual Screening
Types of Virtual Screening Software
? High-throughput Docking of ligands into
protein X-Ray structures (Gold, FlexX)
? DB for pharmacophore search (Catalyst,
Unity)

Current VS software applications adress:
?
?
?
?

Conformations
Ionization states
Stereo centers
Tautomers

?
X

?

X (exception FlexX 2004)
Biological Relevance of Tautomers
Tautomeric states of ligands can be relevant
for biological interactions
? Derivates of tetrazole, triazole, thiazole,
pyrazole, iminopyrimidine, ´
? Brandstetter et al., MMP-8-Inhibitors
J. Biol. Chem. 276, 2001, 17405-12.
? Pospisil et al., Ligands of herpesviral
thymidine kinases, Helvet. Chim. Acta 85,
2002, 3237-50.
Software: Tautomer Generation
Tautomer Generation Applications
? Agent 2.0 (ETH Zurich- Switzerland)
? QUAC PAC 1.1 (OpenEyes)
? StereoPlex (Tripos)
? no extensions by the means of user-defined rules
? no tautomer-sensitive duplicate check
Aim: Easily extensible and scriptable software that
allows the integration and automation of tautomer
generation in our existing screening workflow.
? CACTVS: Chemical data management system
Tautomer enumeration
? C core library, Tcl command layer
? Main command:
ens transform $eh $tlist <direction> <reactionmode> <flags>
<overlapmode> <excludelist> <maxtautomers> <timeout>

tlist: Transformation definition - SMIRKS line notation
[#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1]

? preferred tautomer forms
? tautomer sensitive duplicate check
? 21 pre-defined rules (up to 1,11-H-shifts)
? user-defined tautomer sets
Examples
N

N
H2N

N
H1

N
H

N
H2N

N
H2

H
N

HN

N

N

H2N

N

N

N
N
H 11

N
HN

N

H2N

N
4

H
N
N
N
H 12

HN
HN

N
H

N
N

H2N

N
5

N

HN

8

N
H

HN

N
N
H 13

HN

N
14

N
H 10

N
H

OH
N

HN

N

N

OH
N

HN

N
H

OH

N
H9

N

N

N

O

OH

H2N

OH
H
N

N
H3

HN

7

O

HN

H2N

N

HN

OH
H
N

N

6

HN

N

N

N

OH

O

H2N

H
N

OH

O

OH

O

O

N
H

H
N

HN
HN

N

N
15

Only 2 transformation rules are needed (1,3 and 1,5-H-shift)
Database Expansion
1400000

x
3,4

1200000
1000000

x
3,5

800000
x
3,2

600000
400000

x
3,6

x
3,6
x
3,0

x
2,7

200000
0

Maybridge

no tautom ers

Specs

with tautom ers

TimTec

V itasM

A sinex
Platinum

A sinex
Gold

ChemDiv
Tautomer Enumeration - Benchmark
Platform: SGI Fuel R1400 / 600 MHz, 1 GB RAM
Performance depends on
? nature of the compounds
? number of tautomers
SupplierDB
Compounds/min Multiplier
Maybridge Screening
> 150
2,5
Asinex Platinum
> 250
2,9
VitasM (in-hose Stock)
> 560
3
Tripos Leadscreen
> 1400
2,2
Tautomeric Fingerprints
a) Asinex (Platinum Collection)
100

40
30

60

20
10

40

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 20 21

20

b) Specs (Screening Collection)
100

0
1

4

7

10

13

16

20 24 28 34
No of tautomer forms

39
Tautomer frequency

Tautomer frequency

50

80

46
80

52

69

50

115

40
30

60

20
10

40

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21

20
0
1

9

17 25 33 41 49 57 65 74 84 95 108 125 149 182 210 348 742
No of tautomer forms
Virtual Screening Workflow @ Intervet
2D / 3D
Structure DB
(MDL)

PreProcessing

Tautomer
Generation

Specific 3D
Databases
(Catalyst, Unity)

Tautomer-sensitive
Duplicate check
Data
Analysis

Virtual
Screening
Example I - MMP-8 (PDB Entry Code: 1JJ9)
Matrix Metalloproteinase Inhibitor: 8-Barbiturate

H. Brandstetter
et al., MMP-8Inhibitors
J. Biol. Chem.
276, 2001,
p 17405-12.
http://home.t-online.de/home/kubinyi/dd-18.pdf
Example I C MMP-8 Pharmacophore from X-RAY
Matrix Metalloproteinase Inhibitor: 8-Barbiturate

H-Bond Donor: green
H-Bond Acceptor: magenta
Hydrophobic aliphatic: blue
Ring Aromatic: brown
Example I C MMP-8 Testdatabase
993 molecules selected from NCI 2000 Database (Catalyst Version)

2D / 3D
Structure DB
(NCI 2000)

Divers Compound
Selection (Cerius 2 4.9)

3D
Database
(Catalyst,
No Tautomers)

Tautomer
Generation (CACTVS)

994 compounds including
8-Barbiturate (X-RAY) *
3D
Database
Default conditions for Catalyst
database building C exception:
max conformers 150

(Catalyst,
with tautomers)

1536 compounds including
5 8-Barbiturate Tautomers

* Knowledge of
stable tautomeric
form is needed as
prerequisite.
Example I C MMP-8 Search on Testdatabase
Questions:
? Covergage of the suggested tautomeric states for the 8barbiturate by our workflow ?
? Conformer generation of Catalyst resemble X-Ray ?
? Fit value significant for X-Ray and also conformers ?
? Signal to noise relationship C fit values of other hits ?

3D
Database

Exotic luxury or
useful effort

(Catalyst,
WithTautomers)

1536 compounds including
5 8-Barbiturate Tautomers

?

3D
Database
(Catalyst,
No Tautomers)

994 compounds including
8-Barbiturate (X-RAY)
Example I C MMP-8 Results
Best Search C no tautomers:
? 29 compounds found
? 8-barbiture acid found in database, but scores less significant
(BestFit 3.2)
? 8-barbiture tautomer form (X-ray) BestFit 7.2 (AV= 3.0 SD = 1.7)
? second best scored hit show BestFit 6.2
? only 3 hits score higher then BestFit 4
? best non X-Ray conformer scores BestFit 6.3
Example I C MMP-8 Results
Best Search - with tautomers:
? 30 compounds found (22 unique, 8 tautomeric duplicates)
? 8-barbiture BestFit 7.2 (AV= 2.0 SD = 1.5)
? second best scored hit show BestFit 5.5
? only 3 hits score higher then BestFit 4
? non tautomeric 8-barbiture scores BestFit 3.2
? 60 % Overlap between both search results (all top scoring hits in
common)
Example I C MMP-8 Summary
? Significant better fit value and hit separation in
case of a database search including tautomers
? X-ray structure closely resembled
? Number unique hits reduced
? Significant more structures have to be
converted C time consuming aspect
Critical aspect:
? Hit rate (unique hits) is lower for database
including tautomers (?)
? X-ray structure or known physiological
conditions in the protein appear to be important
for sensitive pharmacophore searches
Example II - CDK of Eimeria tenella
Cyclin Dependent Kinase C Homology Model based on Sequence
Analysis

C. Beyer et. al.
Oral & Poster Presentation
at the 18. Darmst?dter
Molecular Modelling
Workshop 2004

J.H. Kinnaird et al.
International Journal for Parasitology 34,
2004, 683C692
Example II - CDK of Eimeria tenella - hiphop
Qualitative pharmacophore model derived from human CDK2 best
selective inhibitors C prefilter for docking libraries

Feature mapping
of pharmacophore
hypothesis with
CDK2 selective
molecules
H-Bond Donor: green
H-Bond Acceptor: magenta
Hydrophobic: blue
Ring Aromatic: brown
Example II - CDK of Eimeria tenella - database
993 molecules selected from NCI 2000 Database (Catalyst
Version) plus 123 known human CDK1/2 inhibitors. 93 ligands
show activity against CDK2.

3D
Database

3D
Database

(Catalyst,
WithTautomers)

(Catalyst,
No Tautomers)

2368 compounds including
837 CDK1/2 inhibitor tautomers *
*733 CDK2 inhibitor tautomers

1116 compounds including
123 CDK1/2 known inhibitors
Example II - CDK of Eimeria tenella - Results
A) Search results without tautomers:
? best hypothesis finds 55.5 % of the CDK2 known
? Inhibitors (AV 39.8 % SD 9.5 %)
? Best selective (selectivity higher 4 included)
? Number of hits 81

B) Search results with tautomers:
? best hypothesis finds 72.2. % of the CDK2 known
? Inhibitors (AV 66.4 % SD 10.0 %)
? Best selective (selectivity higher 4 included)
? Number of unique hits 61
? Overlap of best hypo search in A) with results in B) 93 %
Example II - CDK of Eimeria tenella - Results
A) Search results without tautomers:
? GH Score 0.54 for best pharmacophore model

B) Search results with tautomers:
? GH Score 0.77 for best pharmacophore model of A)
Example II - CDK of Eimeria tenella - Summary
? Number of true hits better in case of tautomers
? High overlap among top scoring ligands for both
searches
Remark: SBF models under way
Critical aspect:
? Number of unique hits is reduced by using
tautomer databases
? All kinds of tautomeric states are considered.
Future Tasks
? Modifications of tautomeric rules

? Automatisation of database building
workflow
? Implementation of defined Ionisations
? Further investigations with examples
Acknowledgements
Dr. J. Cramer, C. Bayer
Dr. J. Schr?der, PD Dr. P. Selzer
Intervet Innovation GmbH
Dr. W.-D. Ihlenfeldt
Χemistry GmbH
Dr. O. Sacher
Molecular Networks GmbH
Dr. T. Hidaka
Takeda Pharmaceutical Ltd.
Paul Selzer
Richard Marh?fer

Andreas Rohwer

J?rg Schr?der

J?rg Cramer
BIOCHEMINFORMATICS

Who we are ...
Anette Klinger

Carsten Beyer

Frank Oellien
Kristin Engels

Andreas Krasky
Hon Tran

More Related Content

Tautomers - Advanced Databases for in-silico Screening?

  • 1. Tautomers - Advanced Databases for in-silico Screening ? Frank Oellien, 18th CIC Workshop 2004, Boppard
  • 2. Overview ? Motivation for tautomers in screening data sets ? Tautomer enumeration approach ? Workflow ? Examples for `enhanced¨ Database Search ? Results for `enhanced¨ Database Search ? Summary and Future Tasks
  • 3. Biological Relevance of Tautomers Targets: Bacteria, Arthropods, and Parasites Variance for physiological pH expected C not static values !
  • 4. Ambiguities in Proteins and Ligands Proteins (X-Ray structures) ? Flexibility (e.g. Gln, Asn) ? Ionisation states (e.g. Glu, Asp) ? Tautomerism (e.g. His) Ligands (Compound Libraries) ? ? ? ? Conformations Ionization states Stereo centers Tautomers
  • 5. Software: Virtual Screening Types of Virtual Screening Software ? High-throughput Docking of ligands into protein X-Ray structures (Gold, FlexX) ? DB for pharmacophore search (Catalyst, Unity) Current VS software applications adress: ? ? ? ? Conformations Ionization states Stereo centers Tautomers ? X ? X (exception FlexX 2004)
  • 6. Biological Relevance of Tautomers Tautomeric states of ligands can be relevant for biological interactions ? Derivates of tetrazole, triazole, thiazole, pyrazole, iminopyrimidine, ´ ? Brandstetter et al., MMP-8-Inhibitors J. Biol. Chem. 276, 2001, 17405-12. ? Pospisil et al., Ligands of herpesviral thymidine kinases, Helvet. Chim. Acta 85, 2002, 3237-50.
  • 7. Software: Tautomer Generation Tautomer Generation Applications ? Agent 2.0 (ETH Zurich- Switzerland) ? QUAC PAC 1.1 (OpenEyes) ? StereoPlex (Tripos) ? no extensions by the means of user-defined rules ? no tautomer-sensitive duplicate check Aim: Easily extensible and scriptable software that allows the integration and automation of tautomer generation in our existing screening workflow. ? CACTVS: Chemical data management system
  • 8. Tautomer enumeration ? C core library, Tcl command layer ? Main command: ens transform $eh $tlist <direction> <reactionmode> <flags> <overlapmode> <excludelist> <maxtautomers> <timeout> tlist: Transformation definition - SMIRKS line notation [#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1] ? preferred tautomer forms ? tautomer sensitive duplicate check ? 21 pre-defined rules (up to 1,11-H-shifts) ? user-defined tautomer sets
  • 9. Examples N N H2N N H1 N H N H2N N H2 H N HN N N H2N N N N N H 11 N HN N H2N N 4 H N N N H 12 HN HN N H N N H2N N 5 N HN 8 N H HN N N H 13 HN N 14 N H 10 N H OH N HN N N OH N HN N H OH N H9 N N N O OH H2N OH H N N H3 HN 7 O HN H2N N HN OH H N N 6 HN N N N OH O H2N H N OH O OH O O N H H N HN HN N N 15 Only 2 transformation rules are needed (1,3 and 1,5-H-shift)
  • 11. Tautomer Enumeration - Benchmark Platform: SGI Fuel R1400 / 600 MHz, 1 GB RAM Performance depends on ? nature of the compounds ? number of tautomers SupplierDB Compounds/min Multiplier Maybridge Screening > 150 2,5 Asinex Platinum > 250 2,9 VitasM (in-hose Stock) > 560 3 Tripos Leadscreen > 1400 2,2
  • 12. Tautomeric Fingerprints a) Asinex (Platinum Collection) 100 40 30 60 20 10 40 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 20 b) Specs (Screening Collection) 100 0 1 4 7 10 13 16 20 24 28 34 No of tautomer forms 39 Tautomer frequency Tautomer frequency 50 80 46 80 52 69 50 115 40 30 60 20 10 40 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 20 0 1 9 17 25 33 41 49 57 65 74 84 95 108 125 149 182 210 348 742 No of tautomer forms
  • 13. Virtual Screening Workflow @ Intervet 2D / 3D Structure DB (MDL) PreProcessing Tautomer Generation Specific 3D Databases (Catalyst, Unity) Tautomer-sensitive Duplicate check Data Analysis Virtual Screening
  • 14. Example I - MMP-8 (PDB Entry Code: 1JJ9) Matrix Metalloproteinase Inhibitor: 8-Barbiturate H. Brandstetter et al., MMP-8Inhibitors J. Biol. Chem. 276, 2001, p 17405-12.
  • 16. Example I C MMP-8 Pharmacophore from X-RAY Matrix Metalloproteinase Inhibitor: 8-Barbiturate H-Bond Donor: green H-Bond Acceptor: magenta Hydrophobic aliphatic: blue Ring Aromatic: brown
  • 17. Example I C MMP-8 Testdatabase 993 molecules selected from NCI 2000 Database (Catalyst Version) 2D / 3D Structure DB (NCI 2000) Divers Compound Selection (Cerius 2 4.9) 3D Database (Catalyst, No Tautomers) Tautomer Generation (CACTVS) 994 compounds including 8-Barbiturate (X-RAY) * 3D Database Default conditions for Catalyst database building C exception: max conformers 150 (Catalyst, with tautomers) 1536 compounds including 5 8-Barbiturate Tautomers * Knowledge of stable tautomeric form is needed as prerequisite.
  • 18. Example I C MMP-8 Search on Testdatabase Questions: ? Covergage of the suggested tautomeric states for the 8barbiturate by our workflow ? ? Conformer generation of Catalyst resemble X-Ray ? ? Fit value significant for X-Ray and also conformers ? ? Signal to noise relationship C fit values of other hits ? 3D Database Exotic luxury or useful effort (Catalyst, WithTautomers) 1536 compounds including 5 8-Barbiturate Tautomers ? 3D Database (Catalyst, No Tautomers) 994 compounds including 8-Barbiturate (X-RAY)
  • 19. Example I C MMP-8 Results Best Search C no tautomers: ? 29 compounds found ? 8-barbiture acid found in database, but scores less significant (BestFit 3.2) ? 8-barbiture tautomer form (X-ray) BestFit 7.2 (AV= 3.0 SD = 1.7) ? second best scored hit show BestFit 6.2 ? only 3 hits score higher then BestFit 4 ? best non X-Ray conformer scores BestFit 6.3
  • 20. Example I C MMP-8 Results Best Search - with tautomers: ? 30 compounds found (22 unique, 8 tautomeric duplicates) ? 8-barbiture BestFit 7.2 (AV= 2.0 SD = 1.5) ? second best scored hit show BestFit 5.5 ? only 3 hits score higher then BestFit 4 ? non tautomeric 8-barbiture scores BestFit 3.2 ? 60 % Overlap between both search results (all top scoring hits in common)
  • 21. Example I C MMP-8 Summary ? Significant better fit value and hit separation in case of a database search including tautomers ? X-ray structure closely resembled ? Number unique hits reduced ? Significant more structures have to be converted C time consuming aspect Critical aspect: ? Hit rate (unique hits) is lower for database including tautomers (?) ? X-ray structure or known physiological conditions in the protein appear to be important for sensitive pharmacophore searches
  • 22. Example II - CDK of Eimeria tenella Cyclin Dependent Kinase C Homology Model based on Sequence Analysis C. Beyer et. al. Oral & Poster Presentation at the 18. Darmst?dter Molecular Modelling Workshop 2004 J.H. Kinnaird et al. International Journal for Parasitology 34, 2004, 683C692
  • 23. Example II - CDK of Eimeria tenella - hiphop Qualitative pharmacophore model derived from human CDK2 best selective inhibitors C prefilter for docking libraries Feature mapping of pharmacophore hypothesis with CDK2 selective molecules H-Bond Donor: green H-Bond Acceptor: magenta Hydrophobic: blue Ring Aromatic: brown
  • 24. Example II - CDK of Eimeria tenella - database 993 molecules selected from NCI 2000 Database (Catalyst Version) plus 123 known human CDK1/2 inhibitors. 93 ligands show activity against CDK2. 3D Database 3D Database (Catalyst, WithTautomers) (Catalyst, No Tautomers) 2368 compounds including 837 CDK1/2 inhibitor tautomers * *733 CDK2 inhibitor tautomers 1116 compounds including 123 CDK1/2 known inhibitors
  • 25. Example II - CDK of Eimeria tenella - Results A) Search results without tautomers: ? best hypothesis finds 55.5 % of the CDK2 known ? Inhibitors (AV 39.8 % SD 9.5 %) ? Best selective (selectivity higher 4 included) ? Number of hits 81 B) Search results with tautomers: ? best hypothesis finds 72.2. % of the CDK2 known ? Inhibitors (AV 66.4 % SD 10.0 %) ? Best selective (selectivity higher 4 included) ? Number of unique hits 61 ? Overlap of best hypo search in A) with results in B) 93 %
  • 26. Example II - CDK of Eimeria tenella - Results A) Search results without tautomers: ? GH Score 0.54 for best pharmacophore model B) Search results with tautomers: ? GH Score 0.77 for best pharmacophore model of A)
  • 27. Example II - CDK of Eimeria tenella - Summary ? Number of true hits better in case of tautomers ? High overlap among top scoring ligands for both searches Remark: SBF models under way Critical aspect: ? Number of unique hits is reduced by using tautomer databases ? All kinds of tautomeric states are considered.
  • 28. Future Tasks ? Modifications of tautomeric rules ? Automatisation of database building workflow ? Implementation of defined Ionisations ? Further investigations with examples
  • 29. Acknowledgements Dr. J. Cramer, C. Bayer Dr. J. Schr?der, PD Dr. P. Selzer Intervet Innovation GmbH Dr. W.-D. Ihlenfeldt Χemistry GmbH Dr. O. Sacher Molecular Networks GmbH Dr. T. Hidaka Takeda Pharmaceutical Ltd.
  • 30. Paul Selzer Richard Marh?fer Andreas Rohwer J?rg Schr?der J?rg Cramer BIOCHEMINFORMATICS Who we are ... Anette Klinger Carsten Beyer Frank Oellien Kristin Engels Andreas Krasky Hon Tran