LAMPO: LAtent Movement Policy Optimization

S. Tosatto, G. Chalvatzaki, J. Peters.

In submission to RAL-ICRA 2021

Install

This python package currently depends on other packages which needs to be installed. We recommend to create a python 3.6 virtual environment, for example using conda.

Install ROMI (robotic movement interface) which we use to learn ProMPS and as an interface to the real robot

git clone https://github.com/SamuelePolimi/romi
cd romi
pip install -e .

Install MPPCA (mixture of probabilistic component analyzers)

cd ..
git clone https://github.com/SamuelePolimi/MPPCA
cd MPPCA/mppca/cmixture
python setup.py build_ext --inplace
cd ../..
pip install -e .

Install HeRL (Helper Reinforcement Learning), from which we use some specific dataset interface and saving modules.

cd ..
git clone https://github.com/SamuelePolimi/MPPCA
cd MPPCA/mppca/cmixture
python setup.py build_ext --inplace
cd ../..
pip install -e .

You will require also sklearn, pytorch, scipy, psutil and rlbench. To install rlbench follow carefully the instruction at https://github.com/stepjam/RLBench.

Run an experiment

First of all, be aware that there is a configuration file which select some default hyperparameters which are also used in the paper. The configuration file is present at core/config.py at it specity how many cluster to use and the latent space dimension.

All the other configurations are specified in the parameter list of experiment_organizer.py (for what regards experiments with LAMPO) or ct_experiment_organizer.py to run the Colome and Torras algorithm referenced in the paper or the GMM+REPS (by calling the --no_dr option).

An example to run an experiment

experiment_organizer.py folder_name --task_name reach_target --kl 0.2 -c -1 --n_runs 1 --imitation_learning 200 -z --forward  --max_iter 10 --n_evaluations 500 --batch_size 100 --dense_reward

We specify few of the options, but full description is enquirable by running experiment_organizer.py ?.

folder_name the name of the folder where you want to save the data. it will be created in experiments.
--kl the kl-bound
--forward means that the kl will be forward (and not reverse), as mentioned in the paper
--n_runs how many runs do you want to run?
--n_evaluations at each iteration, how many evaluations do you want
--batch_size how many sample should we take from the n_evaluations to learn
--imitation_learning how many demonstration do we want to use
-z to do importance sampling with normalization (otherwise the default is without)

If you want, provided that the folder mentioned is created, you can launch also single experiments, very similarly

experiment.py folder_name --task_name reach_target --kl 0.2 -c -1 -l 200 --save -z --forward  --max_iter 10 --n_evaluations 500 --batch_size 100 --dense_reward

and you will also have the option

-p to have a real-time plot
-v to visualize the environment in Coppelia.

Similar running commands can be launched for ct_experiment.py, ct_experiment_organizer.py, im_experiment.py and im_organizer.py and im_organize_ct.py for only imitation learning.

Structure of the code:

core contains the important code
core/augmented_task contains tasks from Coppelia and a bit refined (with dense reward and nicer interface)
core/collector.py to collect experiments
core/colome_torras.py the algorithm referenced with CT in the paper
core/config.py configuration file
core/imitation_learning.py perform imitation learning using MPPCA
core/iterative_rwr_gmm.py implementation of reward weighted responsability Gaussian mixture model (used by Colome and Torras)
core/lab_connection.py code to connect to the client of the real robot
core/lampo.py the main optimization process run by our algorithm
core/model.py the torch model of our probabilistic graph
core/reps.py relative entropy policy searc, used by Colome and Torras
rl_bench_box.py a boxing class for rl_bench. Here you find also 2d-reacher
task_interface.py a generic interface for tasks
tcp_protocol.py a protocol for communicating with the client controlling the real robot
dataset_generator.py generate a dataset of demonstration from RLBench and save it on disk
learn_movement.py learn the movements of a dataset of demonstration and save it on disk
visualize_behavior.py when you run an experiment, you will save some model file on the disk. You can run those model by specifying the name of the folder, the name of the file (without .npz) and telling which task to run

Other files, like the .sh are scripts to launch experiments.

For further inquiries

The code has been tested with python 3.6, and with all the dependencies installed should run. For any specific question at any regards, please send an e-mail to samuele.tosatto@gmail.com or samuele.tosatto@tu-darmstadt.de.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
core		core
deep_experiments/first		deep_experiments/first
experiments		experiments
parameters		parameters
.gitignore		.gitignore
README.md		README.md
create_folder.sh		create_folder.sh
ct_experiment.py		ct_experiment.py
ct_experiment_organizer.py		ct_experiment_organizer.py
ct_experiments.sh		ct_experiments.sh
ct_grid_search_2d.sh		ct_grid_search_2d.sh
dataset_generator.py		dataset_generator.py
deep_experiment.py		deep_experiment.py
deep_experiment_launcher.sh		deep_experiment_launcher.sh
deep_experiment_organizer.py		deep_experiment_organizer.py
euclide_experiment_2d.sh		euclide_experiment_2d.sh
euclide_experiments.sh		euclide_experiments.sh
experiment.py		experiment.py
experiment_2d_obstacle.sh		experiment_2d_obstacle.sh
experiment_drawer_batch.sh		experiment_drawer_batch.sh
experiment_new_protocol.py		experiment_new_protocol.py
experiment_organizer.py		experiment_organizer.py
experiment_water_plants_batch.sh		experiment_water_plants_batch.sh
learn_movement.py		learn_movement.py
script.py		script.py
server_example.py		server_example.py
slurm_experiments.sh		slurm_experiments.sh
visualize_behavior.py		visualize_behavior.py

SamuelePolimi/LAMPO

Folders and files

Latest commit

History

Repository files navigation

LAMPO: LAtent Movement Policy Optimization

Install

Run an experiment

Structure of the code:

For further inquiries

TODO:

About

Resources

Stars

Watchers

Forks

Languages