Implementation of the proximal policy optimization on Atari environments. All hyper-parameters have been chosen based on the paper.
For continuous domain (Mujoco) look at this.
- gym == 0.17.2
- numpy == 1.19.1
- opencv_contrib_python == 3.4.0.12
- torch == 1.4.0
- tqdm == 4.47.0
pip3 install -r requirements.txt
The training requires a suitable (not necessarily a 1080 Ti or a 2080 RTX Nvidia gpu 😁) gpu-enabled machine. Google Colab provides what is enough to train such an algorithm but if you a more powerful free online gpu provider, take a look at: paperspace.com.
- To run the code:
python3 main.py
- If you want continue the previous training procedure, turn
LOAD_FROM_CKP
toTrue
otherwise, the training would be restarted from scratch. - If you want to test the agent, simply turn
Train
flag toFalse
. There is a pre-trained model in the Pre-trained models directory that you may use to see agent plays.
- Pong
- Breakout
- MsPacman
- Following graphs are breakout environment's result.
Proximal Policy Optimization Algorithms, Schulman et al., 2017
@OpenAI for Baselines.
@higgsfield for his ppo code.