Skip to content

Playground for reinforcement learning algorithms implemented in TensorFlow

Notifications You must be signed in to change notification settings

tilarids/reinforcement_learning_playground

Repository files navigation

reinforcement_learning_playground

Playground for reinforcement learning algorithms implemented in TensorFlow.

OpenAI gym problems solved

Vanilla Policy Gradients

Vanilla policy gradients with ValueFunction to estimate value for the specific state (I use current observation, previous observation and previous action as a state). This same algorithm works fine without ValueFunction if you don't stop the learning process at step 200 and continue learning after that. OpenAI Gym's monitor stops the game at step 200 so you can't use monitor at the same time as training on more than 200 steps. (inspiration for the use of ValueFunction comes from https://github.com/wojzaremba/trpo)

Gym evaluation - CartPole-v0

Gym evaluation - CartPole-v1

Reproducing:

  • Consider changing the API key :)
  • python pg_agent.py
  • python pg_agent.py CartPole-v1

Policy Gradients with TRPO

The same as above but use conjugate gradients + line search method described in TRPO paper. Inspiration for the implementation comes from the https://github.com/wojzaremba/trpo again but I tried to make it more readable and close to the paper.

Please also note that this agent doesn't use dropout. The reason is that TRPO doesn't work well with dropout. That is, with high dropout rate the KL divergence may be very high even between exactly equal set of params. This is due to randomized nature of dropout.

Gym evaluation - CartPole-v0

Gym evaluation - CartPole-v1

Gym evaluation - Copy-v0

Reproducing:

  • Consider changing the API key :)
  • python trpo_agent.py
  • python trpo_agent.py CartPole-v1
  • python trpo_agent.py Copy-v0

New environments solved

Caesar cipher

I am introducing new environment that is a fork of "Copy-v0" environment but except for copying input tape into output tape the agent needs to decode Caesar-ciphered text into output tape. The same algorithm that works with CartPole-v0 and Copy-v0 also works here. The only difference is the amount of hidden units.

asciicast

Reproducing: python trpo_caesar.py

About

Playground for reinforcement learning algorithms implemented in TensorFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages