Skip to content

afcarl/Playing_Atari_the_right_way

 
 

Repository files navigation

Playing Atari the right way!

Cover


The simplest implementation for playing Atari games using game screen as input. Also contains code to implement visual foresight using adversarial action conditioned video prediction model (Still working on this).

Paper: Playing Atari the right way!, ECE6504 Project, 2017

Dependencies

Install virtualenv and creating a new virtual environment:

pip install virtualenv
virtualenv -p /usr/bin/python3 atari

Install dependencies

pip3 install -r requirements.txt

Notes:

  • Training the agent to play breakout at a reasonable level took about 80 hours on two p100s. Don't even think about running this on a CPU. I would highly appreciate it if you can submit a pull request that makes training faster (I know some of my methods suck).

  • The trained models can easily be used to test the performance of an agent on a CPU.

Architecture graph from Tensorboard

Cartpole agent

Training a DQN agent

Playing Cartpole using the game states as input (Just a sanity check)

python3 play_cartpole.py

To change the hyperparameters modify mission_control_cartpole.py.

Cartpole agent

Note:

  • This isn't as computationally demanding as Breakout using frames.

Playing Breakout using the game frames as input

python3 play_breakout.py

To change the hyperparameters modify mission_control_breakout.py.

Breakout agent

Note:

  • I have included the trained model for Breakout after 14 million episodes. Just explore the Results director for Breakout.
  • Change train_model to False and show_ui to True to load the saved model and see the agent in action.

Results from training Breakout agent

Plot of the rewards obtained per episode during training

Breakout Reward

Q-value histogram after each episode

Breakout histo

Max Q-values after each episode

Breakout max Q

Use the trained model to generate dataset

python3 generate_dataset.py

Note:

  • You might get some directory nt found errors (Will fix it soon) or just figure it out.

Training an action conditioned video prediction model

python3 generate_model_skip.py

Note:

  • This uses the adversarial action conditioned video prediction model.
  • Run generate_model.py to use the architecture from [2].

Results from action conditioned video prediction model

Breakout agent


Playing Breakout using RAM content as input

python3 play_breakout_ram.py

To change the hyperparameters modify mission_control_breakout_ram.py.

Plot of the rewards obtained per episode during training

Breakout agent

Note:

  • Each run generates a required tensorboard files under ./Results/<model>/<time_stamp_and_parameters>/Tensorboard directory.
  • Use tensorboard --logdir <tensorboard_dir> to look at loss variations, rewards and a whole lot more.
  • Windows gives an error when : is used during folder naming (this is produced during the folder creation for each run). I would suggest you to remove the time stamp from folder_name variable in the form_results() function. Or, just dual boot linux!

References

[1] Human Level Control Through Deep Reinforcement Learning

[2] Action-Conditional Video Prediction using Deep Networks in Atari Games

About

The simplest implementation for playing Atari games using game screen as input.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%