Temporal Difference Learning Algorithms for Policy Evaluation

What is it?

This package contains implementations of the most relevant TD methods for policy evaluation (i.e. estimating the value function) and a benchmark framework to systematically assess their quality in a variety of scenarios. Only methods for linear function approximation are considered.

Implemented methods

The following algorithms as implemented in the td module:

TD Learning with e-traces
TDC with e-traces
GTD
GTD2
LSTD with e-traces
Bellman Residual Minimization with or without double sampling (+ e-traces)
Residual Gradient with or without double sampling
GPTD with e-traces
Kalman TD
LSPE with e-traces
FPKF with e-traces

In addition, the package contains rudimentary implementations (in the regtd module) of different regularization schemes for LSTD such as

LSTD with l2 regularization
Dantzig-LSTD
LarsTD
LSTD-l1
LSTD with l2l1 regularization

Benchmarks

The package contains the implementations of several MDPs suitable for benchmarking the algorithms (see examples.py). While the implementations are of more general nature, there are ready-to-run scripts for the following benchmark scenarios:

14-State Boyan Chain [boyan]
Baird Star Example [baird]
400-State Random MDP On-policy [disc_random_on]
400-State Random MDP Off-policy [disc_random_off]
Lin. Cart-Pole Balancing On-pol. Imp. Feat. [lqr_imp_onpolicy]
Lin. Cart-Pole Balancing Off-pol. Imp. Feat. [lqr_imp_offpolicy]
4-dim. State Pole Balancing Onpolicy Perfect Features [lqr_full_onpolicy]
Lin. Cart-Pole Balancing Off-pol. Perf. Features [lqr_full_offpolicy]
Cart-Pole Swinup On-policy [swingup_gauss_onpolicy]
Cart-Pole Swinup Off-policy [swingup_gauss_offpolicy]
20-link Lin. Pole Balancing On-policy [link20_imp_onpolicy]
20-link Lin. Pole Balancing Off-policy [link20_imp_offpolicy]

The scripts are located in the experiments folder and should be executed from the base directory. The results of the experiments is stored in the data folder. The plots directory contains scripts which automtically create the figures of the paper Dann, Neumann, Peters -- Policy Evaluation with Temporal Differences: A Survey and Comparison from the stored results. Alternatively, the data can be viewed interactively by executing

from experiments import *
name = "lqr_full_offpolicy" # the name of the experiment (in brackets above)
measure = "RMSPBE" # Root Mean squared projected bellman error, alternatives: RMSE, RMSBE 
plot_experiment(name, measure)

Be aware that the scripts make heavy use of harddisk caching to avoid re-computation of runtime intensive results. This will significantly speed-up re-executions of experiments. The cache is located in the cache folder any may grow up to several GB.

Grid-search for hyper-parameter tuning

Exhaustive grid-search is implemented for tuning hyper-parameters of the algorithms. To find optimal parameters for a given benchmark use the script experiments/gridsearch.py. The script takes the following parameters:

--experiment: name of the benchmark. It must be a module in the experiments folder. The grid-search script automatically imports the module to use the settings defined there
--njobs: number of cores to use in parallel
--batchsize: number of parameter settings to evaluate per job. Increasing the value may speed-up the search for small benchmarks due to the additional overhead per job.

For example, finding parameters for the discrete random MDP on-policy benchmark (3) can be started with

python experiments/gridsearch.py --experiment disc_random_on

The results of the grid-search are stored in a directory with the name of the benchmark located in the data folder. You may want to have a look at 2d-slices of the hyper-parameter space. The plot_2d_error_grid_experiment in the experiments package will help you. For example, the performance depencency of the FPKF on its alpha and mins parameter for fixed lambda=0 and beta=100 on the discrete random MDP on-policy benchmark can be illustrated by

from experiments import * 
plot_2d_error_grid_experiment("disc_random_on", "FPKF", criterion="RMSE", pn1="alpha", pn2="mins", settings={"beta": 100, "lam": 0.})

For further information how to display data have a look at the scripts in the plots directory.

Setup

This code is known to run well with

Python 2.7
Numpy 1.6.1
matplotlib 1.2.0 (up-to date version required for error bars and smooth curves in plots)
Cython 0.17
mlabwrap 1.1 (http://mlabwrap.sourceforge.net/ , for executing the PILCO policy for the cart-pole swing-up task)
custom joblib version available from https://github.com/chrodan/joblib (to have custom hashing functions for more complex objects)

We provide short installation instruction for Unix systems in the following.

Compiling Swing-up dynamics

The dynamics of the cart-pole swing-up benchmark are implemented in Python to make it really fast. Therefore the swingup_ode module needs to be compiled.

cython swingup_ode.pyx
gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.7 -o swingup_ode.so swingup_ode.c

You maybe need to adapt the Python include path to your settings. Alternatively, the module can be compiled with distutils by executing in the base directory:

python setup.py build_ext --inplace
mv tdlearn/swingup_ode.so .

Installing custom joblib version locally

The custom version of joblib can be installed locally in the directory so that it is used automatically by this framework but does not interfere with code outside. This can be done by executing:

git pull  https://github.com/chrodan/joblib joblib_repo
ln -s joblib_repo/joblib joblib

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
experiments		experiments
joblib_repo		joblib_repo
mlab_cartpole		mlab_cartpole
plots		plots
tdlearn		tdlearn
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
joblib		joblib
matplotlibrc		matplotlibrc
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

joblib_repo

joblib_repo

mlab_cartpole

mlab_cartpole

plots

plots

tdlearn

tdlearn

.gitignore

.gitignore

.gitmodules

.gitmodules

README.md

README.md

joblib

joblib

matplotlibrc

matplotlibrc

setup.py

setup.py

Repository files navigation

Temporal Difference Learning Algorithms for Policy Evaluation

What is it?

Implemented methods

Benchmarks

Grid-search for hyper-parameter tuning

Setup

Compiling Swing-up dynamics

Installing custom joblib version locally

About

Releases

Packages

Languages

hans/tdlearn

Folders and files

Latest commit

History

Repository files navigation

Temporal Difference Learning Algorithms for Policy Evaluation

What is it?

Implemented methods

Benchmarks

Grid-search for hyper-parameter tuning

Setup

Compiling Swing-up dynamics

Installing custom joblib version locally

About

Resources

Stars

Watchers

Forks

Languages