async_deep_reinforce

Asynchronous deep reinforcement learning + Pseudo-count based reward + On-highscore-learning

About

This code is fork from miyosuda's code. I added many functions for my Deep Learning experiments. Of which, pseudo-count based reward based on following DeepMind's paper and on-highscore-learning (my original) enable over 1500 point average score in Montezuma's Revenge, which is higher than the paper as for A3C.

https://arxiv.org/abs/1606.01868 (Unifying Count-Based Exploration and Intrinsic Motivation, DeepMind)

"on-highscore-learning" is my original idea, which learn from state-action-rewards-history when getting highscore. But in evaluation of Montezuma's Revenge, I set option to reset highscore in every episode, so learning occured in every score. (I'm changing this now. In new version, only highscore episode will be selected automatically based on history of scores)

Slide

See following slide (in English) for explanation of this project.

http://www.slideshare.net/ItsukaraIitsuka/drl-challenge-on-montezumas-revenge

See following slide (in Japanese) for Japanese explanation.

http://www.slideshare.net/ItsukaraIitsuka/deepmind20166-unifying-countbased-exploration-and-intrinsic-motivation-pseudocount-montezumas-revenge

Learning curve of Montezuma's Revenge

The following graph is the average score of Montezuma's Revenge.

0 - 30M steps: Pseudo-count based reward is ON.

30 - 40M steps: Above + on-highscore-learning is ON.

Best Learning curve of Monezuma's Revenge

The following graph is the best Learning Curve of Montezuma's Revenge (2016/10/7). Best score is 2500 and peak average score is more than 1500 point.

Explored Rooms

My result The following picture indicates the rooms explored in my all trainings.

This is better than DeepMind result (see next picture). This was achieved in OpenAI Gym environment only. In ALE environment, although the average score is higher than OpenAI Gym, the number of explored rooms is less than that of OpenAI Gym.

DeepMind's result The rooms expolred in DeepMind paper (all in all).

Play movie

The following is a play movie of Montezuma's Revenge after training 50M steps. Its score is 2600.

How to prepare environment

This code needs Anaconda, tensorflow, opencv3 and Arcade Learning Environment (ALE). After download of gcp-install-a3c-env.tgz, you can use scrips in "gcp-install" directory. Run following.

$ sudo apt-get install git
$ git clone https://github.com/Itsukara/async_deep_reinforce.git
$ mkdir Download
$ cp async_deep_reinforce/gcp-install/gcp-install-a3c-env.tgz Download/
$ cd Download/
$ tar zxvf gcp-install-a3c-env.tgz
$ bash -x install-Anaconda.sh
$ . ~/.bashrc
$ bash -x install-tensorflow.sh
$ bash -x install-opencv3.sh
$ bash -x install-ALE.sh
$ bash -x install-additionals.sh
$ cd ../async_deep_reinforce
$ ./run-option montezuma-c-avg-greedy-rar025

When program requests input, just hit Enter or input "y" or "yes" and hit Enter. But as for Anaconda, you have to input "q" when License and "--More--" is displayed.

I built the environment using my scripts on Ubuntu 14.04LTS 64bit in Google Cloud Platform, Amazon EC2 and Microsoft Azure.

How to train

To train,

$ ./run-option montezuma-c-max-greedy-rar025

To display game screen played by the program,

$ python a3c_display.py --rom=montezuma_revenge.bin --display=True

To create play movie without displaying the game screen,

$ python a3c_display.py --rom=montezuma_revenge.bin --record-screen-dir=screen
$ run-avconv-all screen # you need avconv

Run options

As for options, see options.py.

How to reproduce OpenAI Gym Result

I uploaded evaluation result in OpenAI Gym. See "OpenAI Gym evaluation page". I'd appreciate if you cloud review my evaluation.

To repuroduce OpenAP Gym result,

$ ./run-option-gym montezuma-j-tes30-b0020-ff-fs2

Play screens are recorded in following directory,

screen.new-room : screens when entered new room are recored
screen.new-record : screens when achieved new score are recorded

Status of code

The source code is still under development and may chage frequently. Currently, I'm searching best parameters to speed-up learning and get higher score. In this search, I'm adding new functions to change behavior of the program. So, it might be degraded sometimes. Sorry for that in advance.

Sharing experiment result

I'd appreciate if you could write your experiment result to thread "Experiment Results" in Issues.

Blog

I'm writing blog on this program. See following (in Japanese):

http://itsukara.hateblo.jp/ (Itsukara's Blog)

How to refer

I'd appreciate if you woud refer my code in your blog or paper as following:

https://github.com/Itsukara/async_deep_reinforce (On-Highscore-Learning code based on A3C+ and Pseudo-count developed by Itsukara)

Acknowledgements

@miosuda for providing very fast A3C program.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
checkpoints.montezuma-b-rap000-avg-84M		checkpoints.montezuma-b-rap000-avg-84M
docs		docs
gcp-install		gcp-install
gcp-preemptible-VM-instaces		gcp-preemptible-VM-instaces
learning-curves		learning-curves
logs		logs
movies		movies
sample-yamls		sample-yamls
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
a3c.py		a3c.py
a3c_display.py		a3c_display.py
a3c_training_thread.py		a3c_training_thread.py
a3c_visualize.py		a3c_visualize.py
accum_trainer.py		accum_trainer.py
accum_trainer_test.py		accum_trainer_test.py
all-plot		all-plot
average.py		average.py
breakout.bin		breakout.bin
checkpoints.montezuma-c-avg-greedy-rar025-78M.tgz		checkpoints.montezuma-c-avg-greedy-rar025-78M.tgz
cleanup		cleanup
custom_lstm.py		custom_lstm.py
game_ac_network.py		game_ac_network.py
game_state.py		game_state.py
game_state_test.py		game_state_test.py
montezuma_revenge.bin		montezuma_revenge.bin
options.py		options.py
plot.py		plot.py
plot2.py		plot2.py
pong.bin		pong.bin
psc-view.py		psc-view.py
report		report
rmsprop_applier.py		rmsprop_applier.py
rmsprop_applier_test.py		rmsprop_applier_test.py
rooms.py		rooms.py
run-avconv-all		run-avconv-all
run-avconv-and-rm		run-avconv-and-rm
run-option		run-option
run-option-gym		run-option-gym
run-psc-view-all		run-psc-view-all
run-record		run-record

License

Itsukara/async_deep_reinforce

Folders and files

Latest commit

History

Repository files navigation

async_deep_reinforce

About

Slide

Learning curve of Montezuma's Revenge

Best Learning curve of Monezuma's Revenge

Explored Rooms

Play movie

How to prepare environment

How to train

Run options

How to reproduce OpenAI Gym Result

Status of code

Sharing experiment result

Blog

How to refer

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages