55. Discrete Action Continuos Action
Action in Real world
DQN solved High dimensional state, but not continues action
56. Two methods of choosing action
1. action-value :
- Learning the action value
- Estimate action value 覦朱 action .
- Policies would not even exist without the action-value estimates
2. Parameterized policy :
- select actions without consulting value function
- Value function still be used to learn policy parameter
- Value function action 蠍一朱 讌
J(慮) : Performance measure
(s, a) = E[Gt St = s, At = a]