Skip to content

Implementation of the agent described by Deepmin in their article of 2013 using python3 and Theano

License

Notifications You must be signed in to change notification settings

BigRiceBall-zz/DQN-Deepmind-NIPS-2013

Repository files navigation

DQN-Deepmind-NIPS-2013

This project is an implementation of the deep Q network described by deepmind in their article of 2013 using python 3.5 and Theano

 

Dependencies

Main dependencies

The following dependencies are crucial for this project and are used by the agent itself. The given version should be used. However, higher version would probaly work too

  • Arcade learning environment (ALE): the versions 0.5.0 and 0.5.1 downloadable on the ALE website don't support python 3. However the current development version (December 2016) available on their Github repository does.
  • Theano: The release version 0.8 and the current 0.9 development version (December 2016) seems to differ slightly as some packages/modules seems to have changed names or been moved. I used the 0.9 develpment version available from the Theano Github repository.
  • cv2: The cv2 module is used to resize the images from ALE to feed the network used by the agent. The version used is the 3.1.0

Other dependencies

These dependencies are used to save the agent, display the game or plot some statistics.

  • PyQt: PyQt version 4.8.7 is used to display the game and to build the window where the plots are displayed.
  • pyqtgraph: The plots and percentile plots are displayed thanks to pyqtgraph version 0.10.0.
  • sqlite3: The agent's data and the statistics collected during the training are saved in an sqlite database.
  • matplotlib: matplotlib is not really used and is there for debug purpose only. This should be removed in the future
  • Usual python modules: collections, datetime, enum, gc, json, math, numbers, numpy, pickle, random, sys, time, threading

 

Usage

To start training the agent, simply type python Run.py <rom file>. This will create a new default agent, initilize it and start training it. A window will open displaying the game in the form it's fed to the agent and another windows will show the evolution of the agent accross the epochs. To stop the process, type stop and every processes will terminate.

The Atari 2600 ROMs are available on the AtariAge website. After downloading the file you have to make sure it's named as ALE expects it to be named otherwise, it can lead a segmentation fault. The names for each supported games can be found in the ALE sources, by inspecting the file related to your game in /src/games/supported/.

 

Results

The agent is trained as explained in the deepmind paper.

Before training, a test set is built by picking random samples from games played randomly. By default, one epoch lasts 50000 iterations and every epoch, the agent plays for 10000 iterations using an epsilon greedy strategy with epsilon equal to 0.05. Every epoch, the following results are plotted and stored:

  • The average value of the output of the network over the test set
  • The average value of the maximum outputs of the network over the test set
  • The average score per games played
  • The average reward per games played (as in the Deepmind's paper, the reward are clipped between -1 and 1)

These results are stored in an sqlite database. DB Browser for SQLite provides an easy way to display and plot those results.

While I didn't observe the same evolution of the output of the Q function as deepmind, I got similar results for the average score.

Have a look at the result folder for a analysis of the project.

 

References

[1] Playing Atari with Deep Reinforcement Learning

[2] Arcade Learning Environlent Technical Manual

[3] CS231n: Convolutional Neural Networks for Visual Recognition - course notes

About

Implementation of the agent described by Deepmin in their article of 2013 using python3 and Theano

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published