The document discusses using quasi-Monte Carlo integration to generate molecular shape fingerprints for efficient comparison of molecular shapes. Shape fingerprints represent the shape of a molecule as a bit string based on whether quasi-random points fall inside or outside the van der Waals surface. This allows for fast generation and comparison of shape fingerprints to enable large-scale virtual screening and clustering based on molecular shape. Examples are given analyzing shape clusters of ROCK2 inhibitors from PubChem. Future work may explore different shape representations and point distributions.
1 of 31
Downloaded 12 times
More Related Content
QMC-based Shape Fingerprints
1. Quasi-Monte Carlo integration for
the fast and effective generation of
molecular shape 鍖ngerprints
John D. MacCuish
Norah E. MacCuish,
Michael Hawrylycz, and Mitch Chapman
ACS San Francisco 2010
COMP Drug Discovery
john.maccuish@mesaac.com
3. Why Shape?
Shape is a component in binding...
3D QSAR
similarity searching
compound acquisition (shape diversity)
virtual screening...
Can be confounding -- multiple
binding sites, surface binding...
One more tool in the toolbox...
Boost to 2D
Pluses and minuses: Molecular Shape and Medicinal Chemistry:
A Perspective, JMedChem, Feb. 2010, Nicholls, et al
4. Why Shape Fingerprints?
Compute shape comparisons
ef鍖ciently on a large scale
Ef鍖cient storage
Simple.
5. Prior Work
Shape as a mixture of spherical
gaussians (atoms) (Grant and
Pickup...96)
Fingerprints composed of key
shapes, given the above method
(Haigh, et al...05).
Ultrafast Shape Recognition:
Ballester, Richards 2007
ShaEP, Vainio, et al 2009
6. Monte Carlo Integration
Approximate an integral (e.g., a 3D
volume) by random sampling
Approximate volume of odd shapes or
manifolds that are analytically dif鍖cult
or have no closed form solution.
Better error convergence than grid
sampling (Metropolis).
7. quasi-Monte Carlo
Integration (QMC)
In practice, quasi-randomly generated
points have best error convergence in
low dimensions.
Beats uniform random sampling (e.g.,
pseudo, dart throwing, etc.)
QMC became popular in early-90s in
computer graphics (Shirley, 91) mid-90s
in 鍖nance - options/futures pricing
(Morokoff, Ca鍖isch, 95)
8. Approximate Volume
quasi-random
grid pseudo-random (low discrepancy) P
1.00
1.00
1.00
0.90
0.90
0.90
0.80
0.80
0.80
0.70
0.70
0.70
0.60
0.60
0.60
0.50
0.50
0.50
0.40
0.40
0.40
0.30
0.30
0.30
0.20
0.20
0.20
0.10
0.10
0.10
0.00
0.00
0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Error related to the Error related to just Error related to just
number of points number of points number of points
and the dimension
9. Approximation Error
grid -- for error 竜, need 1/竜d grid points
-- error convergence exponential in
the dimension
pseudo -- 1n convergence
quasi -- 1/n convergence in practice
in low dimensions
quasi-random fewest points needed
for equivalent approximation error.
10. Approximating the
Volume of a molecule
E.g., CPK with van der Waals radii --
Set of intersecting spheres
Find a suitable bounding region for
molecules
Point sample the bounding region
and tally up the points either inside
or outside of the atom spheres.
11. Approximating the
Volume of a molecule
Bounding Volume times (Points
Inside)/(All Points) ~= molecular
volume.
Sphere or scalene ellipsoid as a
bounding region reduces total overall
points.
13. Fingerprint Generation
Preliminaries
Need a bounding region that covers the
conformational space of the database of
small molecule conformations.
Cube? Sphere? Scalene Ellipsoid.
Need orientation and alignment.
14. Fewer points
1.00
0.90
0.80
0.70
0.60
Dont need these points Dont need these points
0.50
0.40
0.30
0.20
0.10
0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
15. Fingerprint Generation
1. Generate a sample quasi-point set bounding
region centered and 鍖xed at (0,0,0); Sphere of
~11 radius. This is 鍖xed, done once.
2. Mean center atom centers point set of a
conformation to be 鍖ngerprinted.
3. Find sample points in the atom spheres (or
function or your choice, e.g., solvent accessible
surface ...)
4. Find the principal axes for a shape using the
sampled points and the atom centers with SVD.
Adjust for SVD sign ambiguity (Bro, et al, 2008).
16. Fingerprint Generation
5. Rotate atom centers point set to Principal
Axes (PA)
6. Find points in PA con鍖guration and
create 鍖ngerprint with 4 orientations --
points in molecule 1, points out 0
7. Sub鍖ngerprints, each the length of the
number of points.
21. Shape Fingerprints
4 鍖ngerprints per shape. e.g., 10,240
X 4 bits = 5.12 Kbytes.
Number and density of points
determines resolution, speed of
comparison, and storage size.
Number of bits on is small, typically
3-10% with CPK model. Fingerprints
compress signi鍖cantly.
22. Fingerprint Similarity
Choose 1 subFP of Shape FP A and compare it
with all 4 subFPs of Shape FP B.
The largest similarity comparison is chosen.
Bit difference per subFP is small in a given FP
23. Fingerprint Similarity
Inherent slight asymmetry of alignment of
point sets of different sizes.
Aligning B to As principal axes, vs aligning A
to Bs principal axes.
Try this on your favorite method...
Use bit string similarity measure of your
choice... Tanimoto, Ochai (Cosine), ...,
Tversky,... Baroni-Urbani/Bush...
24. Performance
With ~10K points and including IO:
Fingerprint generation: ~500K
conformations per hour
Fingerprint Tanimoto comparisons:
130KX130K matrix in 24 hours
Large scale similarity searching trivial,
e.g., 10X1M < 10 minutes
Numbers estimated with 2009 iMac, with an Intel Core i7 2.8GHz processor
and 4GB RAM, running Mac OS X 10.6.2; small to large compounds;
25. Space!
5.12 Kbytes per 鍖ngerprint -- ~90%
compression
2 million conformations = 1+ GBs
Space and CPU improving all of the
time...
26. FP Experiments
Typical all-pairs
Tanimoto similarity
distribution
1357, 2D 768 MACCS
Keys 鍖ngerprints
Where the action is:
Similarity searching,
Clustering, etc.
above 0.7
28. Examples
Multi-conformation set of actives in a
secondary screen in ROCK2 from
PubChem
Xray ROCK2 ligand
Fingerprint shape cluster
Analyze shape cluster that contains the
xray structure with ChemTattoo 3D
Fragment database analysis using molecular shape fingerprints,
ACS San Francisco, Wednesday 8:40 , CINF 106
30. Future Work
Other shape functions
Other low discrepancy point sets in
different distributions -- density
where it is needed.
Speed (time) and compression
(space)
More practical applications