Skip to content

Dynmi/Proximal-Policy-Optimization

 
 

Repository files navigation

PPO

PRs Welcome

Implementation of the proximal policy optimization on Atari environments. All hyper-parameters have been chosen based on the paper.

For continuous domain (Mujoco) look at this.

Dependencies

  • gym == 0.17.2
  • numpy == 1.19.1
  • opencv_contrib_python == 3.4.0.12
  • torch == 1.4.0
  • tqdm == 4.47.0

Installation

pip3 install -r requirements.txt

Usage

The training requires a suitable (not necessarily a 1080 Ti or a 2080 RTX Nvidia gpu 😁) gpu-enabled machine. Google Colab provides what is enough to train such an algorithm but if you a more powerful free online gpu provider, take a look at: paperspace.com.

  • To run the code:
python3 main.py
  • If you want continue the previous training procedure, turn LOAD_FROM_CKP to True otherwise, the training would be restarted from scratch.
  • If you want to test the agent, simply turn Train flag to False. There is a pre-trained model in the Pre-trained models directory that you may use to see agent plays.

Environments tested

  • Pong
  • Breakout
  • MsPacman

Demo

Result

  • Following graphs are breakout environment's result.

Reference

Proximal Policy Optimization Algorithms, Schulman et al., 2017

Acknowledgement

@OpenAI for Baselines.
@higgsfield for his ppo code.

About

Implementation of the proximal policy optimization on the Atari environments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%