Deep-RL-Torch

This library serves two purposes:

It allows me to understand reinforcement learning algorithms by building them.
It allows the easy combination of various reinforcement learning extensions, such as Prioritized Experience Replay, Eligibility Traces, Random Ensemble Mixture etc.

Current features:

Currently all the following can be used and combined:

All the basic non-mujoco discrete environments, including Atari. Additionally MineRL environments can be used. No default.
Training for either a fixed number of steps, episodes or hours. No default.
Uniform Experience Replay and Prioritied Experience Replay. Defaults to uniform exp replay.
Corrected Experience Replay, CER. Can be combined either with uniform ode rprioritized experience replay.
Frame Stacking as in DQN. The stacking dimension can be chosen, although dimension 0 performs bestaccording to some rudimentary experiments. Defaults to 4 frames and dim 0.
Frame Skipping as in DQN. Defaults to 4.
Optimizations per step - how many batches to sample on for optimization per step in the environment. 0.25 (1 optimization every 4 steps) is the default atm, as in the DQN paper.
Use of a target net that is updated every N steps or of a Polyak-averaged target network, as seen in DDPG. Defaults to Polyak-averaging.
Pretraining on Expert Data - currently only for MineRL data.
Bellman split - adds an additional head to a Q net that takes care of predicting only the immediate reward, whereas the other head is optimized to predict the value of the next state without the immediate reward. I could not yet show that this improves performance.
QV and QVMax learning
Efficient Eligibility traces - as described in v1 of the arXiv paper.
Observation standardization. Turned on by default.
Use of the RAdam optimizer.

Two epsilon anneal schedules: DQN style linearly anneal until time T and then keep constant and exponential decay.
Improved replay buffer making use of the PyTorch dataloader.
Compatibility with Apex
Noisy Nets
Dueling Networks for Q function. Also an addition for it if it is combined with QV learning - the estimated state value in the Q network should be the output of the V network.

The following packages are necessary to run the code:

First we need swig and g++ compilers for box2d environments:

sudo apt-get update
sudo apt-get install swig gcc g++

Then, install all python dependencies via the requirements.txt file:

python3 -m pip install -r requirements.txt

python train.py --env [ENV_SHORTHAND] --n_[steps|episodes|hours] N

ENV_SHORTHANDs are defined in the train.py script. Please define your own shorthand for additional environments.

All additional options can be seen in parser.py

This is only necessary for MineRL environments:

xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" -- python train.py --env [ENV_NAME] --n_[steps|episodes|hours] N

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
deep_rl_torch		deep_rl_torch
devops		devops
experimental		experimental
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
experimenter.py		experimenter.py
requirements.txt		requirements.txt
runExp.py		runExp.py
setup.py		setup.py
test_integration.py		test_integration.py
test_performance.py		test_performance.py
train.py		train.py