The guided policy search(GPS) is the branch of reinforcement learning developed for real-world robotics, and its utility is substantiated along many research. This slide show contains the comprehensive concept of GPS, and the detail way to implement, so it would be helpful for anyone who want to study this field.
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
油
1. Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
https://arxiv.org/abs/1806.00746
2. 3D human pose estimation in video with temporal convolutions and semi-supervised training
https://arxiv.org/abs/1811.11742
The guided policy search(GPS) is the branch of reinforcement learning developed for real-world robotics, and its utility is substantiated along many research. This slide show contains the comprehensive concept of GPS, and the detail way to implement, so it would be helpful for anyone who want to study this field.
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
油
1. Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
https://arxiv.org/abs/1806.00746
2. 3D human pose estimation in video with temporal convolutions and semi-supervised training
https://arxiv.org/abs/1811.11742
4. 4
World Model 覦 process (simple tasks ver.)
<Car Racing> Random Policy襦 Exploration
Fake environment
Exploration & Flexibility
When training (M)When testing
5. When training (C)
5
<Car Racing>
Training
Simulate
1. Environment reset 豌 obs 螻
2. Agent螳 obs 磯 random action 豬
3. obs, reward, done, info = model.env.step(action)
4. [encoded_obs, action]螳 rnn 誤朱 れ願
5. z螳螻 h螳
6. Total reward += reward
7. 2覯朱 螳 覦覲
Optimizing
CMA-ES: cma朱 optimizer
襯 伎 讌 螻襴讀
Cumulative reward襯 豕螳
蟆 W, b 谿場譴
World Model 覦 process (simple tasks ver.)