Skip to content

colllin/SLM-Lab

 
 

Repository files navigation

SLM Lab

CircleCI Maintainability Test Coverage

Modular Deep Reinforcement Learning framework in PyTorch. RL environments are already beautifully encapsulated via OpenAI Gym's unified Env interface. This makes them broadly reusable for any learning algorithm. Almost no one re-implements an environment for their project. Unfortunately, almost every project does re-implement the RL algorithm itself, as well as the training and evaluation logic, logging, checkpointing, etc. The goal of this framework is to provide a unified Agent interface which encapsulates RL algorithms to make them more reusable across projects. Also, given an Agent and a Gym, this framework makes it easy to configure, launch, monitor, and analyze experiments at any scale.

  • Provides an Agent Interface with
  • Managed rollout, training, and evaluation loops, distributed across an auto-scaled cluster (optional) via RayRL.
  • Deployment-first design provides a seamless path from research & prototyping to production deployment.
  • Plugin architecture for easily developing and sharing 3rd-party or private agents or environments.
  • Integrated logging via Tensorboard for visualization and analysis. Out-of-the-box basic experiment logging for RL, plus add custom logging specific to your use case.
  • Manual Mode provides a simple web UI for interacting with any environment. Intended for debugging action/observation transformations, debugging an environment, or as a very inefficient form of entertainment.
  • Annotated Mode aids in studying of existing algorithms.
  • Web UI for experiment configuration, launching, and monitoring, (and link to Tensorboard for analysis).

Goals:

  • No code changes necessary to deploy a trained agent.
  • Small changes to algorithms should feel like small changes to Agents or experiment configuration.
  • Things like epsilon-greedy, frameskip, and exploratory noise should feel like pluggable components.
ddqn_beamrider ddqn_breakout ddqn_pong
BeamRider Breakout Pong
ddqn_qbert ddqn_seaquest ddqn_spaceinvaders
Qbert Seaquest SpaceInvaders
References
Installation How to install SLM Lab
Documentation Usage documentation
Benchmark Benchmark results
Gitter SLM Lab user chatroom

Features

Algorithms

SLM Lab implements a number of canonical RL algorithms with reusable modular components and class-inheritance, with commitment to code quality and performance.

The benchmark results also include complete spec files to enable full reproducibility using SLM Lab.

Below shows the latest benchmark status. See benchmark results here.

Algorithm\Benchmark Atari Roboschool
SARSA -
DQN, distributed-DQN
Double-DQN, Dueling-DQN, PER-DQN
REINFORCE -
A2C, A3C (N-step & GAE)
PPO, distributed-PPO
SIL (A2C, PPO)

Environments

SLM Lab integrates with multiple environment offerings:

Contributions are welcome to integrate more environments!

Metrics and Experimentation

To facilitate better RL development, SLM Lab also comes with prebuilt metrics and experimentation framework:

  • every run generates metrics, graphs and data for analysis, as well as spec for reproducibility
  • scalable hyperparameter search using Ray tune

Installation

  1. Clone the SLM Lab repo:

    git clone https://github.com/kengz/SLM-Lab.git
  2. Install dependencies (this uses Conda for optimality):

    cd SLM-Lab/
    sudo bin/setup

Alternatively, instead of running sudo bin/setup, copy-paste from bin/setup_macOS or bin/setup_ubuntu into your terminal and add sudo accordingly to run the installation commands.

Useful reference: Debugging

Quick Start

DQN CartPole

Everything in the lab is ran using a spec file, which contains all the information for the run to be reproducible. These are located in slm_lab/spec/.

Run a quick demo of DQN and CartPole:

conda activate lab
python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev

This will launch a Trial in development mode, which enables verbose logging and environment rendering. An example screenshot is shown below.

Next, run it in training mode. The total_reward should converge to 200 within a few minutes.

python run_lab.py slm_lab/spec/demo.json dqn_cartpole train

Tip: All lab command should be ran from within a Conda environment. Run conda activate lab once at the beginning of a new terminal session.

This will run a new Trial in training mode. At the end of it, all the metrics and graphs will be output to the data/ folder.

A2C Atari

Run A2C to solve Atari Pong:

conda activate lab
python run_lab.py slm_lab/spec/benchmark/a2c/a2c_gae_pong.json a2c_gae_pong train

Atari Pong ran with dev mode to render the environment

This will run a Trial with multiple Sessions in training mode. In the beginning, the total_reward should be around -21. After about 1 million frames, it should begin to converge to around +21 (perfect score). At the end of it, all the metrics and graphs will be output to the data/ folder.

Below shows a trial graph with multiple sessions:

Benchmark

To run a full benchmark, simply pick a file and run it in train mode. For example, for A2C Atari benchmark, the spec file is slm_lab/spec/benchmark/a2c/a2c_atari.json. This file is parametrized to run on a set of environments. Run the benchmark:

python run_lab.py slm_lab/spec/benchmark/a2c/a2c_atari.json a2c_atari train

This will spawn multiple processes to run each environment in its separate Trial, and the data is saved to data/ as usual.

Experimentation / Hyperparameter search

An Experiment is a hyperparameter search, which samples multiple specs from a search space. Experiment spawns a Trial for each spec, and each Trial runs multiple duplicated Sessions for averaging its results.

Given a spec file in slm_lab/spec/, if it has a search field defining a search space, then it can be ran as an Experiment. For example,

python run_lab.py slm_lab/spec/demo.json dqn_cartpole search

Deep Reinforcement Learning is highly empirical. The lab enables rapid and massive experimentations, hence it needs a way to quickly analyze data from many trials. The experiment and analytics framework is the scientific method of the lab.

Experiment graph summarizing the trials in hyperparameter search.

Trial graph showing average envelope of repeated sessions.

Session graph showing total rewards.

This is the end of the quick start tutorial. Continue reading the full documentation to start using SLM Lab.

Read on: Github | Documentation

Design Principles

SLM Lab is created for deep reinforcement learning research and applications. The design was guided by four principles

  • modularity
  • simplicity
  • analytical clarity
  • reproducibility

Modularity

  • makes research easier and more accessible: reuse well-tested components and only focus on the relevant work
  • makes learning deep RL easier: the algorithms are complex; SLM Lab breaks them down into more manageable, digestible components
  • components get reused maximally, which means less code, more tests, and fewer bugs

Simplicity

  • the components are designed to closely correspond to the way papers or books discuss RL
  • modular libraries are not necessarily simple. Simplicity balances modularity to prevent overly complex abstractions that are difficult to understand and use

Analytical clarity

  • hyperparameter search results are automatically analyzed and presented hierarchically in increasingly granular detail
  • it should take less than 1 minute to understand if an experiment yielded a successful result using the experiment graph
  • it should take less than 5 minutes to find and review the top 3 parameter settings using the trial and session graphs

Reproducibility

  • only the spec file and a git SHA are needed to fully reproduce an experiment
  • all the results are recorded in BENCHMARK.md
  • experiment reproduction instructions are submitted to the Lab via result Pull Requests
  • the full experiment datas contributed are public on Dropbox

Citing

If you use SLM Lab in your research, please cite below:

@misc{kenggraesser2017slmlab,
    author = {Wah Loon Keng, Laura Graesser},
    title = {SLM Lab},
    year = {2017},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/kengz/SLM-Lab}},
}

Contributing

SLM Lab is an MIT-licensed open source project. Contributions are very much welcome, no matter if it's a quick bug-fix or new feature addition. Please see CONTRIBUTING.md for more info.

If you have an idea for a new algorithm, environment support, analytics, benchmarking, or new experiment design, let us know.

If you're interested in using the lab for research, teaching or applications, please contact the authors.

About

Modular Deep Reinforcement Learning framework in PyTorch.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 51.0%
  • Jupyter Notebook 47.4%
  • Shell 1.3%
  • Dockerfile 0.3%