Skip to content
forked from wulfebw/async_rl

Python implementation of tabular asynchronous actor critic

License

Notifications You must be signed in to change notification settings

jialrs/async_rl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Summary

This repo contains a process-based implementation of tabular, 1-step, asynchronous advantage actor critic. It's pretty different from the A3C algorithm from the Asynchronous Methods for Deep Reinforcement Learning paper: http://arxiv.org/abs/1602.01783 in that it does not use a function approximator and in that it does not incorporate (the forward view) of eligibility traces.

It is similar in that it implements advantage actor critic with multiple agents updating weights in parallel.

There's also a simple test maze markov decision process (MDP). Whether or not running with muiltiple processes is beneficial seems to depend upon the specific maze MDP you use.

Results

single process:

  • 2x5 maze: 5.4 seconds
  • 2x10 maze: 21.9 seconds
  • 1x30 maze: 49.2 seconds
  • 5x3 maze: 11.9 seconds

two processes:

  • 2x5 maze: 3.5 seconds
  • 2x10 maze: 22.1 seconds
  • 1x30 maze: 79.2 seconds
  • 5x3 maze: 18.5 seconds

File Descriptions

  • run_experiment.py: Script for running an experiment.
  • async_actor_critic.py: Contains the implementation of asynchronous actor critic.
  • experiment.py: Contains implementations for the Experiment and MultiProcessExperiment classes.
  • maze_mdp.py: Defines a simple MDP on which to test the actor critic implementation.
  • utils.py: Utilities used by the algorithm and for plotting.

About

Python implementation of tabular asynchronous actor critic

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%