Skip to content

vgup0/spinning-up-basic

 
 

Repository files navigation

spinning-up-basic

Basic versions of agents from Spinning Up in Deep RL written in PyTorch. Designed to run quickly on CPU on Pendulum-v0 from OpenAI Gym.

To see differences between algorithms, try running diff -y <file1> <file2>, e.g., diff -y ddpg.py td3.py.

For MPI versions of on-policy algorithms, see the mpi branch.

Algorithms

Implementation Details

Note that implementation details can have a significant effect on performance, as discussed in What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. This codebase attempts to be as simple as possible, but note that for instance on-policy algorithms use separate actor and critic networks, a state-independent policy standard deviation, per-minibatch advantage normalisation, and several critic updates per minibatch, while the deterministic off-policy algorithms use layer normalisation. Equally, soft actor-critic uses a transformed Normal distribution by default, but this can also help the on-policy algorithms.

Results

Vanilla Policy Gradient/Advantage Actor-Critic

VPG

Trust Region Policy Gradient

TRPO

Proximal Policy Optimization

PPO

Deep Deterministic Policy Gradient

DDPG

Twin Delayed DDPG

TD3

Soft Actor-Critic

SAC

Deep Q-Network

DQN

Code Links

About

Basic versions of agents from Spinning Up in Deep RL written in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%