Soft Q-Learning

Soft Q-learning (SQL) is a deep reinforcement learning framework for training maximum entropy policies in continous domains. The algorithm is based on the paper Reinforcement Learning with Deep Energy-Based Policies presented at the International Conference on Machine Learning (ICML), 2017.

Getting Started

Soft Q-learning can be run either locally or through Docker.

Prerequisites

You will need to have Docker and Docker Compose installed unless you want to run the environment locally.

Most of the models require a MuJoCo license.

Docker Installation

Currently, rendering of simulations is not supported on Docker due to a missing display setup. As a fix, you can use a local installation. If you want to run the MuJoCo environments without rendering, the docker environment needs to know where to find your MuJoCo license key (mjkey.txt). You can either copy your key into <PATH_TO_THIS_REPOSITY>/.mujoco/mjkey.txt, or you can specify the path to the key in your environment variables:

export MUJOCO_LICENSE_PATH=<path_to_mujoco>/mjkey.txt

Once that's done, you can run the Docker container with

docker-compose up

Docker compose creates a Docker container named soft-q-learning and automatically sets the needed environment variables and volumes.

You can access the container with the typical Docker exec-command, i.e.

docker exec -it soft-q-learning bash

See examples section for examples of how to train and simulate the agents.

To clean up the setup:

docker-compose down

Local Installation

To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.

Clone rllab

cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}

Download and copy MuJoCo files to rllab path: If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the .dylib files instead of .so files.

mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp

Copy your MuJoCo license key (mjkey.txt) to rllab path:

cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco

Clone softqlearning

cd <installation_path_of_your_choice>
git clone https://github.com/haarnoja/softqlearning.git

Create and activate conda environment

cd softqlearning
conda env create -f environment.yml
source activate sql

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

source deactivate
conda remove --name sql --all

Examples

Training and simulating an agent

To train the agent

python ./examples/mujoco_all_sql.py --env=swimmer --log_dir="/root/sql/data/swimmer-experiment"

To simulate the agent (NOTE: This step currently fails with the Docker installation, due to missing display.)

python ./scripts/sim_policy.py /root/sql/data/swimmer-experiment/itr_<iteration>.pkl

mujoco_all_sql.py contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag. For example:

python ./examples/mujoco_all_sql.py --help
usage: mujoco_all_sql.py [-h]
                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
                         [--exp_name EXP_NAME] [--mode MODE]
                         [--log_dir LOG_DIR]

Credits

The soft q-learning algorithm was developed by Haoran Tang and Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.

Reference

@article{haarnoja2017reinforcement,
  title={Reinforcement Learning with Deep Energy-Based Policies},
  author={Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
  booktitle={International Conference on Machine Learning},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
examples		examples
scripts		scripts
softqlearning		softqlearning
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

scripts

scripts

softqlearning

softqlearning

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

docker-compose.yaml

docker-compose.yaml

environment.yml

environment.yml

Repository files navigation

Soft Q-Learning

Getting Started

Prerequisites

Docker Installation

Local Installation

Examples

Training and simulating an agent

Credits

Reference

About

Releases

Packages

Languages

eric-heiden/softqlearning

Folders and files

Latest commit

History

Repository files navigation

Soft Q-Learning

Getting Started

Prerequisites

Docker Installation

Local Installation

Examples

Training and simulating an agent

Credits

Reference

About

Resources

Stars

Watchers

Forks

Languages