PPO

Implementation of the proximal policy optimization on Atari environments. All hyper-parameters have been chosen based on the paper.

For continuous domain (Mujoco) look at this.

Dependencies

gym == 0.17.2
numpy == 1.19.1
opencv_contrib_python == 3.4.0.12
torch == 1.4.0
tqdm == 4.47.0

Installation

pip3 install -r requirements.txt

Usage

The training requires a suitable (not necessarily a 1080 Ti or a 2080 RTX Nvidia gpu 😁) gpu-enabled machine. Google Colab provides what is enough to train such an algorithm but if you a more powerful free online gpu provider, take a look at: paperspace.com.

To run the code:

python3 main.py

If you want continue the previous training procedure, turn LOAD_FROM_CKP to True otherwise, the training would be restarted from scratch.
If you want to test the agent, simply turn Train flag to False. There is a pre-trained model in the Pre-trained models directory that you may use to see agent plays.

Environments tested

Pong
Breakout
MsPacman

Demo

Result

Following graphs are breakout environment's result.

Reference

Proximal Policy Optimization Algorithms, Schulman et al., 2017

Acknowledgement

@OpenAI for Baselines.
@higgsfield for his ppo code.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Results		Results
demo		demo
pre-trained models		pre-trained models
.gitignore		.gitignore
README.md		README.md
brain.py		brain.py
main.py		main.py
model.py		model.py
play.py		play.py
requirements.txt		requirements.txt
runner.py		runner.py
test_policy.py		test_policy.py
utils.py		utils.py

Dynmi/Proximal-Policy-Optimization

Folders and files

Latest commit

History

Repository files navigation

PPO

Dependencies

Installation

Usage

Environments tested

Demo

Result

Reference

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages