My notes on reinforcement learning.
Update: I am implementing some new algorithms in private repos, so the list here is incomplete. I will come back to update this from time to time.
- C51, distributional Q-learning
- Solve Montezuma with re-weighted sampling
- Move PPO into this repo
- DQN
- prioritized replay
- double Q-learning (or half Q-learning)
- dueling networks
-
$\epsilon$ -greedy with linear scheduling
- Gradients, and REINFORCE algorithm
- policy gradients
- Setups
- Get MuJoCo
- setup OpenAI Gym on AWS (yay!:confetti_ball:)
- install
MuJoCo
🎊 - install
mujoco-py
(need to upgrade to 1.50 now supports python 3.6)
- make a list of concepts to keep track of
- TRPO
- A3C
- Behavior Cloning
- DAgger
I found textbook to be the most reliable source but it's easy to get lost in the chapters. So the best way to ask for guidance seem to be:
I'm reading Chapter xx and topic xx atm, what are the key things I should pay attention to?
- David Silver's RL course index
- Berkeley RL course http://rll.berkeley.edu/deeprlcourse/
- http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/
- https://arxiv.org/pdf/1506.05254.pdf is a longer explanation of different viewpoints for taking derivatives.
- Contextual bandits:
- Curiosity as reward
- Finding answers as reward
- inferring intention
- Learning to predict (lots of prior art. self-supervision)
- Auxiliary supervision and Auxiliary modalities.
- inverse reinforcement learning != imitation learning