Asynchronous deep reinforcement learning
An attempt to repdroduce Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning."
http://arxiv.org/abs/1602.01783
Asynchronous Advantage Actor-Critic (A3C) method for playing "Atari Pong" is now implemented as a test with TensorFlow.
(However the learning result is still not good. I'm now investigating about the problem. Any advice or suggestion is strongly welcomed.)
First we need to build multi thread ready version of Arcade Learning Enviroment. I made some modification to it to run it on multi thread enviroment.
$ git clone https://github.com/miyosuda/Arcade-Learning-Environment.git
$ cd Arcade-Learning-Environment
$ cmake -DUSE_SDL=ON -DUSE_RLGLUE=OFF -DBUILD_EXAMPLES=ON .
$ make -j 4
$ pip install .
I recommend to install it on VirtualEnv environment.
To train,
$python a3c.py
To display the result with game play,
$python a3c_disp.py