Skip to content

tkasarla/cpo

 
 

Repository files navigation

Constrained Policy Optimization

Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. [1]

After setting up the required RLLIB and Mujoco modules, clone this repo into /sandbox/cpo.

Then proceed to run cpo in the Point-Gather environment with

python sandbox/cpo/CPO_point_gather.py 

To generate visualization of learned policies, modify the following lines in CPO_point_gather.py file.

algo = CPO(
            env=env,
            policy=policy,
            baseline=baseline,
            safety_constraint=safety_constraint,
            safety_gae_lambda=1,
            batch_size=50000,
            max_path_length=15,
            n_itr=100,
            gae_lambda=0.95,
            discount=0.995,
            step_size=trpo_stepsize,
            optimizer_args={'subsample_factor':trpo_subsample_factor},
            plot=True,
        )


run_experiment_lite (
    run_task,
    n_parallel=4,
    snapshot_mode="last",
    exp_prefix='CPO-PointGather',
    seed=1,
    mode = "local"
    plot=True
)

  1. Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.

for TiML course project.

About

Constrained Policy Optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%