2. ????
Part 2 Lower bound of performance
Part 3 Trust Region Policy Optimization
Part 4 Code review of TRPO
Part 1 Problem of stochastic policy gradient
37. Truncated Natural Policy Gradient
? Truncated Natural Policy Gradient? ??
1. Might not be robust to trust region size ; at some iterations may be too large and
performance can degrade
2. Because of quadratic approximation, KL-divergence constraint may be violated