Soft Q-learning (SQL) is a deep reinforcement learning framework for training maximum entropy policies in continous domains. The algorithm is based on the paper Reinforcement Learning with Deep Energy-Based Policies presented at the International Conference on Machine Learning (ICML), 2017.
Soft Q-learning can be run either locally or through Docker.
You will need to have Docker and Docker Compose installed unless you want to run the environment locally.
Most of the models require a MuJoCo license.
Currently, rendering of simulations is not supported on Docker due to a missing display setup. As a fix, you can use a local installation. If you want to run the MuJoCo environments without rendering, the docker environment needs to know where to find your MuJoCo license key (mjkey.txt
). You can either copy your key into <PATH_TO_THIS_REPOSITY>/.mujoco/mjkey.txt
, or you can specify the path to the key in your environment variables:
export MUJOCO_LICENSE_PATH=<path_to_mujoco>/mjkey.txt
Once that's done, you can run the Docker container with
docker-compose up
Docker compose creates a Docker container named soft-q-learning
and automatically sets the needed environment variables and volumes.
You can access the container with the typical Docker exec-command, i.e.
docker exec -it soft-q-learning bash
See examples section for examples of how to train and simulate the agents.
To clean up the setup:
docker-compose down
To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.
- Clone rllab
cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}
- Download and copy MuJoCo files to rllab path:
If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the
.dylib
files instead of.so
files.
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp
- Copy your MuJoCo license key (mjkey.txt) to rllab path:
cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco
- Clone
softqlearning
cd <installation_path_of_your_choice>
git clone https://github.com/haarnoja/softqlearning.git
- Create and activate conda environment
cd softqlearning
conda env create -f environment.yml
source activate sql
The environment should be ready to run. See examples section for examples of how to train and simulate the agents.
Finally, to deactivate and remove the conda environment:
source deactivate
conda remove --name sql --all
- To train the agent
python ./examples/mujoco_all_sql.py --env=swimmer --log_dir="/root/sql/data/swimmer-experiment"
- To simulate the agent (NOTE: This step currently fails with the Docker installation, due to missing display.)
python ./scripts/sim_policy.py /root/sql/data/swimmer-experiment/itr_<iteration>.pkl
mujoco_all_sql.py
contains several different environments and there are more example scripts available in the /examples
folder. For more information about the agents and configurations, run the scripts with --help
flag. For example:
python ./examples/mujoco_all_sql.py --help
usage: mujoco_all_sql.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
The soft q-learning algorithm was developed by Haoran Tang and Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.
@article{haarnoja2017reinforcement,
title={Reinforcement Learning with Deep Energy-Based Policies},
author={Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
booktitle={International Conference on Machine Learning},
year={2017}
}