reinforcement learning algorithm implementations
TODO: add NoisyNet learning curve
Currently implemented:
- Vanilla DQN [1] [2]
- Async DQN (Multi-processed DQN, follows architecture described in [7].)
- Double DQN [3]
- Dueling DQN [4]
- Multi-step Q-learning DQN [5]
- NoisyNet [6]
- This script is tested on Ubuntu 16 (with NVIDIA Tesla K80) on the Google Cloud Platform.
pyenv
(pyenv/pyenv-installer: This tool is used to installpyenv
and friends.) is recommended for building a python environment.atari-py
requirescmake
,zlib
, etc. Install them first (e.g.apt-get install make cmake zlib1g-dev g++
).
see: Installation Guide — CuPy 4.3.0 documentation
- Install CUDA on your host.
- If you use cupy-recommended environment (https://docs-cupy.chainer.org/en/stable/install.html#recommended-environments),
cuDNN and NCCL libraries are included in
cupy
wheels.$ pip install cupy-cuda92
python train.py myrl/configs/vanilla_dqn.toml PongNoFrameskip-v4
- for more detail see
python train.py --help
- for more detail see
- [1] Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. NIPS.
- [2] Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature.
- [3] van Hasselt, H., A. Guez, and D. Silver. 2016. Deep Reinforcement Learning with Double Q-learning. AAAI.
- [4] Wang, Z., T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas. 2016. Dueling Network Architectures for Deep Reinforcement Learning. ICML.
- [5] Sutton, R. S. 1988. Learning to Predict by the Method of Temporal Differences. Machine Learning.
- [6] Fortunato, M., M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg. 2018. Noisy Networks for Exploration. ICLR.
- [7] Horgan, D., J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, and D. Silver. 2018. Distributed Prioritized Experience Replay. ICLR.