This project implements the standard policy gradient algorithm (REINFORCE) and applies it to solve the lunar lander environment in OpenAi Gym
As can be seen in the results plot, the agent shows signs of learning and is able to solve the lunar lander environment (score of 200 points).
The first time where the agent solves the environment is at around episode 300. However, the learning is not very stable and the agent's performance deterorates after.
Early-stopping techniques can be implemented to save the best version of the agent while learning.
- Activate conda environment with dependencies installed
- Run lunar_lander.py
Project requires: Pytorch v1.4.0 installed Other dependencies include:
- os
- Numpy
- gym
- Matplotlib
- numpy - Fundamental package for scientific computing with Python
- Pytorch - Deep learning Framework used along with Numpy to build Deep Q Networks.
- OpenAI Gym - Provides environments to test Agent's performance
This project was built referencing research papers on applying Q-learning with deep neural networks