Playing Atari the right way!

The simplest implementation for playing Atari games using game screen as input. Also contains code to implement visual foresight using adversarial action conditioned video prediction model (Still working on this).

Paper: Playing Atari the right way!, ECE6504 Project, 2017

Dependencies

Install virtualenv and creating a new virtual environment:

pip install virtualenv
virtualenv -p /usr/bin/python3 atari

Install dependencies

pip3 install -r requirements.txt

Notes:

Training the agent to play breakout at a reasonable level took about 80 hours on two p100s. Don't even think about running this on a CPU. I would highly appreciate it if you can submit a pull request that makes training faster (I know some of my methods suck).
The trained models can easily be used to test the performance of an agent on a CPU.

Architecture graph from Tensorboard

Training a DQN agent

Playing Cartpole using the game states as input (Just a sanity check)

python3 play_cartpole.py

To change the hyperparameters modify mission_control_cartpole.py.

Note:

This isn't as computationally demanding as Breakout using frames.

Playing Breakout using the game frames as input

python3 play_breakout.py

To change the hyperparameters modify mission_control_breakout.py.

Note:

I have included the trained model for Breakout after 14 million episodes. Just explore the Results director for Breakout.
Change train_model to False and show_ui to True to load the saved model and see the agent in action.

Results from training Breakout agent

Plot of the rewards obtained per episode during training

Q-value histogram after each episode

Max Q-values after each episode

Use the trained model to generate dataset

python3 generate_dataset.py

Note:

You might get some directory nt found errors (Will fix it soon) or just figure it out.

Training an action conditioned video prediction model

python3 generate_model_skip.py

Note:

This uses the adversarial action conditioned video prediction model.
Run generate_model.py to use the architecture from [2].

Results from action conditioned video prediction model

Playing Breakout using RAM content as input

python3 play_breakout_ram.py

To change the hyperparameters modify mission_control_breakout_ram.py.

Plot of the rewards obtained per episode during training

Note:

Each run generates a required tensorboard files under ./Results/<model>/<time_stamp_and_parameters>/Tensorboard directory.
Use tensorboard --logdir <tensorboard_dir> to look at loss variations, rewards and a whole lot more.
Windows gives an error when : is used during folder naming (this is produced during the folder creation for each run). I would suggest you to remove the time stamp from folder_name variable in the form_results() function. Or, just dual boot linux!

References

[1] Human Level Control Through Deep Reinforcement Learning

[2] Action-Conditional Video Prediction using Deep Networks in Atari Games

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.idea		.idea
README		README
Results		Results
LICENSE		LICENSE
README.md		README.md
generate_dataset.py		generate_dataset.py
generate_model.py		generate_model.py
generate_model_skip.py		generate_model_skip.py
mission_control_acrobat.py		mission_control_acrobat.py
mission_control_breakout.py		mission_control_breakout.py
mission_control_breakout_ram.py		mission_control_breakout_ram.py
mission_control_cart.py		mission_control_cart.py
mission_control_lunar.py		mission_control_lunar.py
mission_control_pacman_ram.py		mission_control_pacman_ram.py
mission_control_pong.py		mission_control_pong.py
ops.py		ops.py
play_acrobat.py		play_acrobat.py
play_breakout.py		play_breakout.py
play_breakout_ram.py		play_breakout_ram.py
play_cartpole.py		play_cartpole.py
play_lunarlander.py		play_lunarlander.py
play_pacman_ram.py		play_pacman_ram.py
play_pong.py		play_pong.py
requirements.txt		requirements.txt

License

afcarl/Playing_Atari_the_right_way

Folders and files

Latest commit

History

Repository files navigation

Playing Atari the right way!

Dependencies

Architecture graph from Tensorboard

Training a DQN agent

Playing Cartpole using the game states as input (Just a sanity check)

Playing Breakout using the game frames as input

Results from training Breakout agent

Plot of the rewards obtained per episode during training

Q-value histogram after each episode

Max Q-values after each episode

Use the trained model to generate dataset

Training an action conditioned video prediction model

Results from action conditioned video prediction model

Playing Breakout using RAM content as input

Plot of the rewards obtained per episode during training

References

About

Resources

License

Stars

Watchers

Forks

Languages