Constrained Policy Optimization

Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. [1]

After setting up the required RLLIB and Mujoco modules, clone this repo into /sandbox/cpo.

Then proceed to run cpo in the Point-Gather environment with

python sandbox/cpo/CPO_point_gather.py

To generate visualization of learned policies, modify the following lines in CPO_point_gather.py file.

algo = CPO(
            env=env,
            policy=policy,
            baseline=baseline,
            safety_constraint=safety_constraint,
            safety_gae_lambda=1,
            batch_size=50000,
            max_path_length=15,
            n_itr=100,
            gae_lambda=0.95,
            discount=0.995,
            step_size=trpo_stepsize,
            optimizer_args={'subsample_factor':trpo_subsample_factor},
            plot=True,
        )


run_experiment_lite (
    run_task,
    n_parallel=4,
    snapshot_mode="last",
    exp_prefix='CPO-PointGather',
    seed=1,
    mode = "local"
    plot=True
)

Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.

for TiML course project.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
CPO_point_gather.py		CPO_point_gather.py
__init__.py		__init__.py
base.py		base.py
conjugate_constraint_optimizer.py		conjugate_constraint_optimizer.py
conjugate_gradient_optimizer.py		conjugate_gradient_optimizer.py
cpo.py		cpo.py
gather.py		gather.py
gather_env.py		gather_env.py
gaussian_mlp_baseline.py		gaussian_mlp_baseline.py
point_env.py		point_env.py
point_gather_env.py		point_gather_env.py
policy_gradient_safe.py		policy_gradient_safe.py
readme.md		readme.md
sampler_safe.py		sampler_safe.py
start_train.gif		start_train.gif

tkasarla/cpo

Folders and files

Latest commit

History

Repository files navigation

Constrained Policy Optimization

About

Resources

Stars

Watchers

Forks

Languages