Policy Gradient Methods on OpenAi LunarLander

This project implements the standard policy gradient algorithm (REINFORCE) and applies it to solve the lunar lander environment in OpenAi Gym

Analysis of results

As can be seen in the results plot, the agent shows signs of learning and is able to solve the lunar lander environment (score of 200 points).

The first time where the agent solves the environment is at around episode 300. However, the learning is not very stable and the agent's performance deterorates after.

Early-stopping techniques can be implemented to save the best version of the agent while learning.

Getting Started

Activate conda environment with dependencies installed
Run lunar_lander.py

Prerequisites

Project requires: Pytorch v1.4.0 installed Other dependencies include:

os
Numpy
gym
Matplotlib

Built With

numpy - Fundamental package for scientific computing with Python
Pytorch - Deep learning Framework used along with Numpy to build Deep Q Networks.
OpenAI Gym - Provides environments to test Agent's performance

Acknowledgments

This project was built referencing research papers on applying Q-learning with deep neural networks

https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
temp		temp
PolicyGradientAgent.py		PolicyGradientAgent.py
README.md		README.md
Results.png		Results.png
lunar_lander.py		lunar_lander.py
policy.py		policy.py
replay_memory.py		replay_memory.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

temp

temp

PolicyGradientAgent.py

PolicyGradientAgent.py

README.md

README.md

Results.png

Results.png

lunar_lander.py

lunar_lander.py

policy.py

policy.py

replay_memory.py

replay_memory.py

utils.py

utils.py

Repository files navigation

Policy Gradient Methods on OpenAi LunarLander

Analysis of results

Getting Started

Prerequisites

Built With

Acknowledgments

About

Releases

Packages

Languages

jeffery1236/policy-gradient

Folders and files

Latest commit

History

Repository files navigation

Policy Gradient Methods on OpenAi LunarLander

Analysis of results

Getting Started

Prerequisites

Built With

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages