reinforcement_learning_playground

Playground for reinforcement learning algorithms implemented in TensorFlow.

OpenAI gym problems solved

Vanilla Policy Gradients

Vanilla policy gradients with ValueFunction to estimate value for the specific state (I use current observation, previous observation and previous action as a state). This same algorithm works fine without ValueFunction if you don't stop the learning process at step 200 and continue learning after that. OpenAI Gym's monitor stops the game at step 200 so you can't use monitor at the same time as training on more than 200 steps. (inspiration for the use of ValueFunction comes from https://github.com/wojzaremba/trpo)

Gym evaluation - CartPole-v0

Gym evaluation - CartPole-v1

Reproducing:

Consider changing the API key :)
python pg_agent.py
python pg_agent.py CartPole-v1

Policy Gradients with TRPO

The same as above but use conjugate gradients + line search method described in TRPO paper. Inspiration for the implementation comes from the https://github.com/wojzaremba/trpo again but I tried to make it more readable and close to the paper.

Please also note that this agent doesn't use dropout. The reason is that TRPO doesn't work well with dropout. That is, with high dropout rate the KL divergence may be very high even between exactly equal set of params. This is due to randomized nature of dropout.

Gym evaluation - CartPole-v0

Gym evaluation - CartPole-v1

Gym evaluation - Copy-v0

Reproducing:

Consider changing the API key :)
python trpo_agent.py
python trpo_agent.py CartPole-v1
python trpo_agent.py Copy-v0

New environments solved

Caesar cipher

I am introducing new environment that is a fork of "Copy-v0" environment but except for copying input tape into output tape the agent needs to decode Caesar-ciphered text into output tape. The same algorithm that works with CartPole-v0 and Copy-v0 also works here. The only difference is the amount of hidden units.

Reproducing: python trpo_caesar.py

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
orbitvm		orbitvm
.gitignore		.gitignore
README.md		README.md
caesar.py		caesar.py
hyper.py		hyper.py
main.py		main.py
orbitvm_solver.py		orbitvm_solver.py
pg_agent.py		pg_agent.py
space_conversion.py		space_conversion.py
trpo_agent.py		trpo_agent.py
trpo_caesar.py		trpo_caesar.py
value_function.py		value_function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orbitvm

orbitvm

.gitignore

.gitignore

README.md

README.md

caesar.py

caesar.py

hyper.py

hyper.py

main.py

main.py

orbitvm_solver.py

orbitvm_solver.py

pg_agent.py

pg_agent.py

space_conversion.py

space_conversion.py

trpo_agent.py

trpo_agent.py

trpo_caesar.py

trpo_caesar.py

value_function.py

value_function.py

Repository files navigation

reinforcement_learning_playground

OpenAI gym problems solved

Vanilla Policy Gradients

Policy Gradients with TRPO

New environments solved

Caesar cipher

About

Releases

Packages

Languages

tilarids/reinforcement_learning_playground

Folders and files

Latest commit

History

Repository files navigation

reinforcement_learning_playground

OpenAI gym problems solved

Vanilla Policy Gradients

Policy Gradients with TRPO

New environments solved

Caesar cipher

About

Resources

Stars

Watchers

Forks

Languages