Skip to content
/ ddpg Public
forked from rmst/ddpg

Implementation of Deep Deterministic Policy Gradients using TensorFlow, compatible with the OpenAI Gym

Notifications You must be signed in to change notification settings

leix28/ddpg

 
 

Repository files navigation

Deep Deterministic Policy Gradient

Paper: "Continuous control with deep reinforcement learning" - TP Lillicrap, JJ Hunt et al., 2015

Installation

Install Gym and TensorFlow. Then:

pip install pyglet # required for gym rendering
pip install jupyter # required only for visualization (see below)

git clone https://github.com/SimonRamstedt/ddpg.git # get ddpg

Usage

Example:

python run.py --outdir ../ddpg-results/experiment1 --env InvertedDoublePendulum-v1

Enter python run.py -h to get a complete overview.

If you want to run in the cloud or a university cluster this might contain additional information.

Visualization

Example:

python dashboard.py --exdir ../ddpg-results

Enter python dashboard.py -h to get a complete overview.

Known issues

  • No batch normalization yet
  • No conv nets yet (i.e. only learning from low dimensional states)
  • No proper seeding for reproducibilty

Please write me or open a github issue if you encounter problems! Contributions are welcome!

Improvements beyond the original paper

  • Output normalization – the main reason for divergence are variations in return scales. Output normalization would probably solve this.
  • Prioritized experience replay – faster learning, better performance especially with sparse rewards – Please write if you have/know of an implementation!

About

Implementation of Deep Deterministic Policy Gradients using TensorFlow, compatible with the OpenAI Gym

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.1%
  • Jupyter Notebook 36.8%
  • Shell 0.1%