Presentation slides for 'Learning Montezuma's Revenge from a Single Demonstration' by T. Salimans and R. Chen.
You can find more presentation slides in my website:
https://www.endtoend.ai
1 of 12
Download to read offline
More Related Content
[1807] Learning Montezuma's Revenge from a Single Demonstration
2. Exploration and Learning
Exploration: Find action sequence with positive reward
Learning: Remember and generalize action sequence
Need both for a successful agent
3. Montezumas Revenge
One of the hardest games in Atari 2600
Sparse rewards Exploration is difficult
https://www.retrogames.cz/play_124-Atari2600.php?language=EN
4. Simplifying Exploration with Demonstrations
Solution: Shorten the episode
Start the agent near the end of demonstration
Train agent until it ties or beats the demonstrators score
Gradually move starting point back in time
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
5. Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
9. Result
74500 points on Montezumas Revenge (State of the Art)
Surpasses demo score of 71500
Exploits emulator flaw
10. Comparison with DeepMinds approach
DeepMinds approach
Less control over environment needed
Agents imitate the demo
This approach
Need full game states in demo
Directly optimize game score Less overfitting for sub-optimal demo
Better in multiplayer games where performance should be optimized against various
opponents
11. Remaining Challenges
Agent cannot reach exact state in demo
Agent needs to generalize between similar states
Problematic in Gravitar or Pitfall
Careful hyperparameter tuning needed
High variance in each run
NN does not generalize as well as human
https://blog.openai.com/openai-baselines-ppo/
12. Thank you!
Original content by OpenAI
Learning Montezumas Revenge from a Single Demonstration
You can find more content in
github.com/seungjaeryanlee
www.endtoend.ai