Constrained Policy Optimization for rllab

Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. [1]

This module was designed for rllab [2], and includes the implementations of

described in our paper [1].

To configure, run the following command in the root folder of rllab:

git submodule add -f https://github.com/jachiam/cpo sandbox/cpo

Run CPO in the Point-Gather environment with

python sandbox/cpo/experiments/CPO_point_gather.py

Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". _ Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016._

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
algos		algos
baselines		baselines
envs		envs
experiments		experiments
optimizers		optimizers
safety_constraints		safety_constraints
.gitignore		.gitignore
__init__.py		__init__.py
cpo_wrapper.py		cpo_wrapper.py
readme.md		readme.md

Provide feedback