Deep Reinforcement Learning

This code is part of my master thesis at the VUB, Brussels.

Status

Different algorithms have currently been implemented:

Cross-Entropy Method
Sarsa with with function approximation and eligibility traces
REINFORCE (convolutional neural network part has not been tested yet)
Karpathy's policy gradient algorithm (version using convolutional neural networks has not been tested yet)
Advantage Actor Critic
Asynchronous Advantage Actor Critic (A3C)
(Sequential) knowledge transfer
Asynchronous knowledge transfer

Sarsa + function apprixmation

The following parts are combined to learn to act in the Mountain Car environment:

Sarsa
Eligibility traces
epsilon-greedy action selection policy
Function approximation using tile coding

Example of a run after training with a total greedy action selection policy for 729 episodes of each 200 steps:

Total reward per episode:

Note that, after a few thousand episodes, the algorithm still isn't capable of consistently reaching the goal in less than 200 steps.

REINFORCE

Adapted version of this code in order to work with Tensorflow. Total reward per episode when applying this algorithm on the CartPole-v0 environment:

Karpathy Policy Gradient

Adapted version of the code of this article of Andrej Karpathy. Total reward per episode when applying this algorithm on the CartPole-v0 environment:

How quickly the optimal reward is reached and kept heavily varies however because of randomness. Results of an earlier execution are also posted on the OpenAI Gym.

Advantage Actor Critic

Total reward per episode when applying this algorithm on the CartPole-v0 environment:

OpenAI Gym page

Asynchronous Advantage Actor Critic

Total reward per episode when applying this algorithm on the CartPole-v0 environment:

This only shows the results of one of the A3C threads. Results of another execution are also posted on the OpenAI Gym. Results of an execution using the Acrobot-v1 environment can also be found on OpenAI Gym.

How to run

First, install the requirements using pip:

pip install -r requirements.txt

Algorithms/experiments

You can run algorithms by passing an experiment specification (in json format) to main.py:

python main.py <experiment_description>

Example of an experiment specification

Statistics

Statistics can be plot using:

python misc/plot_statistics.py <path_to_stats>

<path_to_stats> can be one of 2 things:

A json file generated using gym.wrappers.Monitor, in case it plots the episode lengths and total reward per episode.
A directory containing TensorFlow scalar summaries for different tasks, in which case all of the found scalars are plot.

Help about other arguments (e.g. for using smoothing) can be found by executing python misc/plot_statistics.py -h

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
actionselection		actionselection
agents		agents
environment		environment
functionapproximation		functionapproximation
misc		misc
notebooks		notebooks
policies		policies
results		results
scripts		scripts
traces		traces
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
experiment_spec_example.json		experiment_spec_example.json
main.py		main.py
requirements.txt		requirements.txt

License

linpingchuan/DeepRL

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning

Status

Sarsa + function apprixmation

REINFORCE

Karpathy Policy Gradient

Advantage Actor Critic

Asynchronous Advantage Actor Critic

How to run

Algorithms/experiments

Statistics

About

Resources

License

Stars

Watchers

Forks

Languages