際際滷

際際滷Share a Scribd company logo
Optimization of Transition State
Structures using Evolutionary Algorithms
                    Dr. Lukasz Miroslaw
                    lukasz.miroslaw@uzh.ch

                Organic Chemistry Institute
             Grid Computing Competence Center
              University of Zurich, Switzerland


          Kim Baldridges group meeting, 27.09.2012




                                                      1
Table of content



Introduction:
  - Objectives and motivations

  - Evolutionary Algorithms: main concept and key

     operators.

Model:
 - Definition

 - Results




Open Questions

Alternative Approaches
                                                     2
Introduction


Objective

Detection of transition state (TS) structures with effective methods.

Why?

 Transition state structures are energy maxima along the minimum energy


path connecting reactants and products of chemical reaction.

   Difficult to detect experimentally, simulations are a must!

    but simulations are computationally expensive.

 GA (by S. D. Bungay, R.A. Porier and R.J. Charron) one of the proposed


methods to find TS.

                                                                          3
Literature Review



Current approaches mentioned in the paper do not guarantee
convergence to TS structures:
   - BFGS method, TS-BFGS, OC (Optimally Condensed)
      are forced to keep the Hessian matrix positive
      definite.
   - The methods require a good initial guess (chemical
      intuition).

GA has been employed in energy minimization of molecular
clusters since 1995 (Mentres, Scuseria).


                                                             4
Evolutionary Algorithm


                                       Individuals are the legal
                                      solutions to our problem.
                                      They form a population that
                                      'evolves' in time and adapts
                                      to the environment.

                                      Fitness function is
                                      measure for the adaptation.

                                      Diversity is crucial. Finding
                                      extrema and saddle points
                                      are more frequent than by
                                      gradient searches.

                                      Operators that drive the
                                      evolution:
                                      Selection, Reproduction
      Baldrige Group, group meeting
                                      (Recombination), Mutation.      5
Hard vs. Soft Selection

                          Hard selection: the best
                          individuals always win.

                          Pros: local mimima are
                          located easily.

                          Cons: crossing saddles
                          almost impossible.

                          Soft selection: probability of
                          selection depends on the
                          fitness.

                          Pros: better saddle crossing.

                          Cons: Parameter-dependent
                          method.

                                                           6
Cross-over

             Recombination:

             Mating process: two
             parents create offspring.

             The offspring consists of
             the generic materials from
             both parents.

             Weaker offspring tend to
             die out in time.

             Goal: variations allows the
             offspring to search out
             different available niches,
             find better fitness values
             ergo better solutions.

                                           7
Mutation




           Mutation occurs in
           nature. Although this
           occurs very infrequently
           many believe this is a
           main driving force for
           evolution. The result of
           mutation can often result
           in a weaker individual.
           Occasionally the result
           might be to produce a
           stronger one.




                                       8
Fitness Function


                                      Points on PES represent chemical
                                      structures, coordinates are
                                      represented by bond angles, bond
                                      lenghths, dihedral angles, etc.
                                      Minima represent rectants,
                                      products and intermediates in the
                                      reaction.
                                      First order saddle points represent
                                      transition state structures.
                                      High-order saddle points
                                      represent no chemical interest.

                                      Goal : points with zero-gradient
                                      and one negative eigenvalue in
                                      the Hessian matrix.


      Baldrige Group, group meeting                                         9
Example: water


Z-MATRIX coordinates

H
O 1 OH
H 2 OH 1 OHO

OH = 1.08
OHO = 107.5

encoded as
1.08 * 1 000 000 = 1 080 000 = (100000111101011000000)2
107.5 * 1 000 000 = 1 075 000 = (11010001111101100)2
concatenated as (10000011110101100000011010001111101100)2
Mutation, recombination and selection are applied to generate a new
offspring until convergence.
                                                                      10
Questions to the Authors



Results are promising, TS are in the same range (data not
shown) BUT:
Mutliplicative or Interval Encoding of variables does not
keep the accuracy. Example

x = 0.23420111234, xnew = x * 10acc = 234201.11234
xnew = 234201  111001001011011001
But 111001001011011001  234201 !
Authors do not precise :
≒ the meaning of small perturbations?

≒ when the Gray coding was performed, after or before

concatenation?
                                                            11
Sphere-based Reaction Path
Following



1. Optimize the geometry to get the ground states of the
analyzed system.
2. Calculate vibrational modes VM in GAMESS.
                         dim(PES) = dim(VM)
3. Generate K spheres with different radii Ri. i=1..K.
                         x2+y2+z2 = Ri2 (3D example)
4. Generate M sampling points uniformly distributed on
each sphere.
5. For each sphere measure the energies for the sampling
points and find local minima.
6. Connect local minima on each sphere Ri to obtain a
reaction path.
                                                           12
Sphere-based Reaction Path
Following



Problems:

How to set R and M ?
What is the best direction for generating new sampling

points on sphere Ri+1 ?
Uniform distribution is generated with Delaunay
triangulation.
 High complexity, many parameters of unknown nature.




Idea: EA to locate local but meaningful
optima on each sphere.

                                                           13
Sphere based approach cont.



Let us define N- individuals Xi = (R, M, E) and evolve them
using mutation and soft selection. E describes the
uniformity, e.g. distance between the points.

For all sampling points per each sphere calculate energies
and generate their histogram (distributions). Fitness
function promotes 'better' histograms.
Note: In the distributions keep the information about the
position of all the sampling points.

Pick the histogram bin with lowest energies and evolve the
system in the directions defined by the sampling points in
that bin.                                                     14
EA for reaction path following

                            Objective: given a system A find better
                            and preferably stable ground states
                            (B, C, ...).

           B                During the evolution the population
               A            should move from A to B (an/or from A
                            to C) and cross the saddle.
                   P
                            Hypothesis: Reaction Path is very
                            close to the saddle crossing path
                            (P).




                                                                      15
EA for reaction path following

1. Optimize the geometry of the
system A.
2. Initialize the population
Xti = {x1, x2, x3, x4, ..., xn} in
the vicinity of the ground state.

xi is a conformer defined by
Z-Matrix or Cartesian
coordinates:
xi = {a1,a2,a3, , aP}, ai  RP

Fitness F(xi) is the ENERGY
F(xi)  RP (PES)
                                     16
EA for reaction path following




Evolve until the higher pick (B) has been reached.


The population must cross the saddle. The path P obtained during
saddle crossing should be close to the reaction path. Analyze
only the vicinity of P with more detailed analysis using
GRADIENT, HESSIAN.




                                                               17
Does it Make Sense ?


Pros:                          Cons:
Saddle crossing is nature of
EA.                            What if P is far from the
                               reaction path ?
Only energies are used to
drive the evolution.           How to constraint
                               mutations?
Our multi-dimensional
models show that small         How to validate P's ?
populations have very good
results.

Chromosomes are real values
                                                           18
(not binary)
Problems, Questions




 How to translate Cartesian  internal coordinates
 How to generate conformants in meaningful way?

 Bond brakes, bond formations, bond lenghts = covalent
 atomic radii?

 Is there a publication that shows the actual PES, even for
 small molecules? Is PES continuous?



                                                              19

More Related Content

Evolutionary-driven Optimization in Computational Chemistry

  • 1. Optimization of Transition State Structures using Evolutionary Algorithms Dr. Lukasz Miroslaw lukasz.miroslaw@uzh.ch Organic Chemistry Institute Grid Computing Competence Center University of Zurich, Switzerland Kim Baldridges group meeting, 27.09.2012 1
  • 2. Table of content Introduction: - Objectives and motivations - Evolutionary Algorithms: main concept and key operators. Model: - Definition - Results Open Questions Alternative Approaches 2
  • 3. Introduction Objective Detection of transition state (TS) structures with effective methods. Why? Transition state structures are energy maxima along the minimum energy path connecting reactants and products of chemical reaction. Difficult to detect experimentally, simulations are a must! but simulations are computationally expensive. GA (by S. D. Bungay, R.A. Porier and R.J. Charron) one of the proposed methods to find TS. 3
  • 4. Literature Review Current approaches mentioned in the paper do not guarantee convergence to TS structures: - BFGS method, TS-BFGS, OC (Optimally Condensed) are forced to keep the Hessian matrix positive definite. - The methods require a good initial guess (chemical intuition). GA has been employed in energy minimization of molecular clusters since 1995 (Mentres, Scuseria). 4
  • 5. Evolutionary Algorithm Individuals are the legal solutions to our problem. They form a population that 'evolves' in time and adapts to the environment. Fitness function is measure for the adaptation. Diversity is crucial. Finding extrema and saddle points are more frequent than by gradient searches. Operators that drive the evolution: Selection, Reproduction Baldrige Group, group meeting (Recombination), Mutation. 5
  • 6. Hard vs. Soft Selection Hard selection: the best individuals always win. Pros: local mimima are located easily. Cons: crossing saddles almost impossible. Soft selection: probability of selection depends on the fitness. Pros: better saddle crossing. Cons: Parameter-dependent method. 6
  • 7. Cross-over Recombination: Mating process: two parents create offspring. The offspring consists of the generic materials from both parents. Weaker offspring tend to die out in time. Goal: variations allows the offspring to search out different available niches, find better fitness values ergo better solutions. 7
  • 8. Mutation Mutation occurs in nature. Although this occurs very infrequently many believe this is a main driving force for evolution. The result of mutation can often result in a weaker individual. Occasionally the result might be to produce a stronger one. 8
  • 9. Fitness Function Points on PES represent chemical structures, coordinates are represented by bond angles, bond lenghths, dihedral angles, etc. Minima represent rectants, products and intermediates in the reaction. First order saddle points represent transition state structures. High-order saddle points represent no chemical interest. Goal : points with zero-gradient and one negative eigenvalue in the Hessian matrix. Baldrige Group, group meeting 9
  • 10. Example: water Z-MATRIX coordinates H O 1 OH H 2 OH 1 OHO OH = 1.08 OHO = 107.5 encoded as 1.08 * 1 000 000 = 1 080 000 = (100000111101011000000)2 107.5 * 1 000 000 = 1 075 000 = (11010001111101100)2 concatenated as (10000011110101100000011010001111101100)2 Mutation, recombination and selection are applied to generate a new offspring until convergence. 10
  • 11. Questions to the Authors Results are promising, TS are in the same range (data not shown) BUT: Mutliplicative or Interval Encoding of variables does not keep the accuracy. Example x = 0.23420111234, xnew = x * 10acc = 234201.11234 xnew = 234201 111001001011011001 But 111001001011011001 234201 ! Authors do not precise : ≒ the meaning of small perturbations? ≒ when the Gray coding was performed, after or before concatenation? 11
  • 12. Sphere-based Reaction Path Following 1. Optimize the geometry to get the ground states of the analyzed system. 2. Calculate vibrational modes VM in GAMESS. dim(PES) = dim(VM) 3. Generate K spheres with different radii Ri. i=1..K. x2+y2+z2 = Ri2 (3D example) 4. Generate M sampling points uniformly distributed on each sphere. 5. For each sphere measure the energies for the sampling points and find local minima. 6. Connect local minima on each sphere Ri to obtain a reaction path. 12
  • 13. Sphere-based Reaction Path Following Problems: How to set R and M ? What is the best direction for generating new sampling points on sphere Ri+1 ? Uniform distribution is generated with Delaunay triangulation. High complexity, many parameters of unknown nature. Idea: EA to locate local but meaningful optima on each sphere. 13
  • 14. Sphere based approach cont. Let us define N- individuals Xi = (R, M, E) and evolve them using mutation and soft selection. E describes the uniformity, e.g. distance between the points. For all sampling points per each sphere calculate energies and generate their histogram (distributions). Fitness function promotes 'better' histograms. Note: In the distributions keep the information about the position of all the sampling points. Pick the histogram bin with lowest energies and evolve the system in the directions defined by the sampling points in that bin. 14
  • 15. EA for reaction path following Objective: given a system A find better and preferably stable ground states (B, C, ...). B During the evolution the population A should move from A to B (an/or from A to C) and cross the saddle. P Hypothesis: Reaction Path is very close to the saddle crossing path (P). 15
  • 16. EA for reaction path following 1. Optimize the geometry of the system A. 2. Initialize the population Xti = {x1, x2, x3, x4, ..., xn} in the vicinity of the ground state. xi is a conformer defined by Z-Matrix or Cartesian coordinates: xi = {a1,a2,a3, , aP}, ai RP Fitness F(xi) is the ENERGY F(xi) RP (PES) 16
  • 17. EA for reaction path following Evolve until the higher pick (B) has been reached. The population must cross the saddle. The path P obtained during saddle crossing should be close to the reaction path. Analyze only the vicinity of P with more detailed analysis using GRADIENT, HESSIAN. 17
  • 18. Does it Make Sense ? Pros: Cons: Saddle crossing is nature of EA. What if P is far from the reaction path ? Only energies are used to drive the evolution. How to constraint mutations? Our multi-dimensional models show that small How to validate P's ? populations have very good results. Chromosomes are real values 18 (not binary)
  • 19. Problems, Questions How to translate Cartesian internal coordinates How to generate conformants in meaningful way? Bond brakes, bond formations, bond lenghts = covalent atomic radii? Is there a publication that shows the actual PES, even for small molecules? Is PES continuous? 19