TRPO in TensorFlow

A Tensorflow implementation of "Trust Region Proximal Optimization" method.

Currently working with discrete actions, continous(gaussian) variables support is straight forward.

Features

Purely build on Tensorflow graphs and encapsulated as a seperate optimizer

You only need to pass the policy function and the cost function to the optimizer and create the cache variables.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
atari_wrapper.py		atari_wrapper.py
common.py		common.py
simulator.py		simulator.py
train-atari.py		train-atari.py
trpo.py		trpo.py