Skip to content

jeffery1236/policy-gradient

Repository files navigation

Policy Gradient Methods on OpenAi LunarLander

This project implements the standard policy gradient algorithm (REINFORCE) and applies it to solve the lunar lander environment in OpenAi Gym

Analysis of results

As can be seen in the results plot, the agent shows signs of learning and is able to solve the lunar lander environment (score of 200 points).

The first time where the agent solves the environment is at around episode 300. However, the learning is not very stable and the agent's performance deterorates after.

Early-stopping techniques can be implemented to save the best version of the agent while learning.

Getting Started

  1. Activate conda environment with dependencies installed
  2. Run lunar_lander.py

Prerequisites

Project requires: Pytorch v1.4.0 installed Other dependencies include:

  • os
  • Numpy
  • gym
  • Matplotlib

Built With

  • numpy - Fundamental package for scientific computing with Python
  • Pytorch - Deep learning Framework used along with Numpy to build Deep Q Networks.
  • OpenAI Gym - Provides environments to test Agent's performance

Acknowledgments

This project was built referencing research papers on applying Q-learning with deep neural networks

https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages