GitHub - pabloesm/temporal-difference_algorithms

Temporal-difference algorithms

A simple implementation in Python of classical TD algorithms, including TD, TD(λ) with accumulating traces and TD(λ) with replacing traces, and the recently proposed True Online TD(λ). The algorithms are based on the paper of van Seijen, H., Sutton, R. S. (2014). True online TD(λ). In: Proceedings of the 31st International Conference on Machine Learning. [link]

Experiments

The experiments reproduce the results of the original paper, which are obtained using a random walk environment with 11 states and two kinds of function approximatiors, namely, Task 1 and Task 2. The following figures compare the performance (in terms of RMS error) of some TD methods for different step-sizes and λ values:

Task 1

TD(λ) accumulating traces

TD(λ) replacing traces

true online TD(λ)

Task 2

TD(λ) accumulating traces

TD(λ) replacing traces

true online TD(λ)

The experiments can be reproduced by running the corresponding file experiment_[algorith_name].py. It should take about a couple of minutes to run.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
basic_exp_TD0.py		basic_exp_TD0.py
environment.py		environment.py
experiment_TD0.py		experiment_TD0.py
experiment_TDlambda.py		experiment_TDlambda.py
experiment_TDlambda_replacing.py		experiment_TDlambda_replacing.py
experiment_trueOnline_TDlambda.py		experiment_trueOnline_TDlambda.py
exputils.py		exputils.py
functionApproximator.py		functionApproximator.py

pabloesm/temporal-difference_algorithms

Folders and files

Latest commit

History

Repository files navigation

Temporal-difference algorithms

Experiments

Task 1

TD(λ) accumulating traces

TD(λ) replacing traces

true online TD(λ)

Task 2

TD(λ) accumulating traces

TD(λ) replacing traces

true online TD(λ)

About

Resources

Stars

Watchers

Forks

Languages