SLM Lab

Modular Deep Reinforcement Learning framework in PyTorch. RL environments are already beautifully encapsulated via OpenAI Gym's unified Env interface. This makes them broadly reusable for any learning algorithm. Almost no one re-implements an environment for their project. Unfortunately, almost every project does re-implement the RL algorithm itself, as well as the training and evaluation logic, logging, checkpointing, etc. The goal of this framework is to provide a unified Agent interface which encapsulates RL algorithms to make them more reusable across projects. Also, given an Agent and a Gym, this framework makes it easy to configure, launch, monitor, and analyze experiments at any scale.

Provides an Agent Interface with
Managed rollout, training, and evaluation loops, distributed across an auto-scaled cluster (optional) via RayRL.
Deployment-first design provides a seamless path from research & prototyping to production deployment.
Plugin architecture for easily developing and sharing 3rd-party or private agents or environments.
Integrated logging via Tensorboard for visualization and analysis. Out-of-the-box basic experiment logging for RL, plus add custom logging specific to your use case.
Manual Mode provides a simple web UI for interacting with any environment. Intended for debugging action/observation transformations, debugging an environment, or as a very inefficient form of entertainment.
Annotated Mode aids in studying of existing algorithms.
Web UI for experiment configuration, launching, and monitoring, (and link to Tensorboard for analysis).

Goals:

No code changes necessary to deploy a trained agent.
Small changes to algorithms should feel like small changes to Agents or experiment configuration.
Things like epsilon-greedy, frameskip, and exploratory noise should feel like pluggable components.



BeamRider	Breakout	Pong

Qbert	Seaquest	SpaceInvaders

References
Installation	How to install SLM Lab
Documentation	Usage documentation
Benchmark	Benchmark results
Gitter	SLM Lab user chatroom

Features

Algorithms

SLM Lab implements a number of canonical RL algorithms with reusable modular components and class-inheritance, with commitment to code quality and performance.

The benchmark results also include complete spec files to enable full reproducibility using SLM Lab.

Below shows the latest benchmark status. See benchmark results here.

Algorithm\Benchmark	Atari	Roboschool
SARSA	-
DQN, distributed-DQN	✅
Double-DQN, Dueling-DQN, PER-DQN	✅
REINFORCE	-
A2C, A3C (N-step & GAE)	✅
PPO, distributed-PPO	✅
SIL (A2C, PPO)

Environments

SLM Lab integrates with multiple environment offerings:

OpenAI gym
OpenAI Roboschool
VizDoom (credit: joelouismarino)
Unity environments with prebuilt binaries

Contributions are welcome to integrate more environments!

Metrics and Experimentation

To facilitate better RL development, SLM Lab also comes with prebuilt metrics and experimentation framework:

every run generates metrics, graphs and data for analysis, as well as spec for reproducibility
scalable hyperparameter search using Ray tune

Installation

Clone the SLM Lab repo:

git clone https://github.com/kengz/SLM-Lab.git

Install dependencies (this uses Conda for optimality):
```
cd SLM-Lab/
sudo bin/setup
```

Alternatively, instead of running sudo bin/setup, copy-paste from bin/setup_macOS or bin/setup_ubuntu into your terminal and add sudo accordingly to run the installation commands.

Useful reference: Debugging

Quick Start

DQN CartPole

Everything in the lab is ran using a spec file, which contains all the information for the run to be reproducible. These are located in slm_lab/spec/.

Run a quick demo of DQN and CartPole:

conda activate lab
python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev

This will launch a Trial in development mode, which enables verbose logging and environment rendering. An example screenshot is shown below.

Next, run it in training mode. The total_reward should converge to 200 within a few minutes.

python run_lab.py slm_lab/spec/demo.json dqn_cartpole train

Tip: All lab command should be ran from within a Conda environment. Run conda activate lab once at the beginning of a new terminal session.

This will run a new Trial in training mode. At the end of it, all the metrics and graphs will be output to the data/ folder.

A2C Atari

Run A2C to solve Atari Pong:

conda activate lab
python run_lab.py slm_lab/spec/benchmark/a2c/a2c_gae_pong.json a2c_gae_pong train

Atari Pong ran with dev mode to render the environment

This will run a Trial with multiple Sessions in training mode. In the beginning, the total_reward should be around -21. After about 1 million frames, it should begin to converge to around +21 (perfect score). At the end of it, all the metrics and graphs will be output to the data/ folder.

Below shows a trial graph with multiple sessions:

Benchmark

To run a full benchmark, simply pick a file and run it in train mode. For example, for A2C Atari benchmark, the spec file is slm_lab/spec/benchmark/a2c/a2c_atari.json. This file is parametrized to run on a set of environments. Run the benchmark:

python run_lab.py slm_lab/spec/benchmark/a2c/a2c_atari.json a2c_atari train

This will spawn multiple processes to run each environment in its separate Trial, and the data is saved to data/ as usual.

Experimentation / Hyperparameter search

An Experiment is a hyperparameter search, which samples multiple specs from a search space. Experiment spawns a Trial for each spec, and each Trial runs multiple duplicated Sessions for averaging its results.

Given a spec file in slm_lab/spec/, if it has a search field defining a search space, then it can be ran as an Experiment. For example,

python run_lab.py slm_lab/spec/demo.json dqn_cartpole search

Deep Reinforcement Learning is highly empirical. The lab enables rapid and massive experimentations, hence it needs a way to quickly analyze data from many trials. The experiment and analytics framework is the scientific method of the lab.

Experiment graph summarizing the trials in hyperparameter search.

Trial graph showing average envelope of repeated sessions.

Session graph showing total rewards.

This is the end of the quick start tutorial. Continue reading the full documentation to start using SLM Lab.

Read on: Github | Documentation

Design Principles

SLM Lab is created for deep reinforcement learning research and applications. The design was guided by four principles

modularity
simplicity
analytical clarity
reproducibility

Modularity

makes research easier and more accessible: reuse well-tested components and only focus on the relevant work
makes learning deep RL easier: the algorithms are complex; SLM Lab breaks them down into more manageable, digestible components
components get reused maximally, which means less code, more tests, and fewer bugs

Simplicity

the components are designed to closely correspond to the way papers or books discuss RL
modular libraries are not necessarily simple. Simplicity balances modularity to prevent overly complex abstractions that are difficult to understand and use

Analytical clarity

hyperparameter search results are automatically analyzed and presented hierarchically in increasingly granular detail
it should take less than 1 minute to understand if an experiment yielded a successful result using the experiment graph
it should take less than 5 minutes to find and review the top 3 parameter settings using the trial and session graphs

Reproducibility

only the spec file and a git SHA are needed to fully reproduce an experiment
all the results are recorded in BENCHMARK.md
experiment reproduction instructions are submitted to the Lab via result Pull Requests
the full experiment datas contributed are public on Dropbox

Citing

If you use SLM Lab in your research, please cite below:

@misc{kenggraesser2017slmlab,
    author = {Wah Loon Keng, Laura Graesser},
    title = {SLM Lab},
    year = {2017},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/kengz/SLM-Lab}},
}

Contributing

SLM Lab is an MIT-licensed open source project. Contributions are very much welcome, no matter if it's a quick bug-fix or new feature addition. Please see CONTRIBUTING.md for more info.

If you have an idea for a new algorithm, environment support, analytics, benchmarking, or new experiment design, let us know.

If you're interested in using the lab for research, teaching or applications, please contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 1,623 Commits
.circleci		.circleci
.github		.github
bin		bin
job		job
slm_lab		slm_lab
test		test
.codeclimate.yml		.codeclimate.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
FIXME.md		FIXME.md
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
SLM-Lab-White-Paper.pdf		SLM-Lab-White-Paper.pdf
TODO.md		TODO.md
TUTORIALS.md		TUTORIALS.md
Untitled.ipynb		Untitled.ipynb
environment.yml		environment.yml
package.json		package.json
run_lab.py		run_lab.py
setup.py		setup.py
yarn.lock		yarn.lock

License

colllin/SLM-Lab

Folders and files

Latest commit

History

Repository files navigation