Udacity : Deep Reinforcement Learning Nanodegree

Project 2: Continuous Control (Reacher)

For this project, we had to build an AI agent to conduct a continous task for keeping the robotic hand (blue nob) in zone of the moving target with the aim of reaching a score greater than 30 for 100 episodes in a row. An episode constists of 1000 timesteps. In each timestep the agent receives a reward of +0.01 for staying in the zone. The robotic arm can be moved by adjusted 4 continous actions that determine the force exerted on the joints of the robotic arm thereby determining its location in space. The actions range in between the values of -1 and +1. The selected environment for this project was the more challenging 20 robotic arm environment

Implemented Agent

This project uses the latest state of the art D4PG agent algorithm to reach the start score. The agent is pretty sophisticated using distributed training, distributional value estimation, n-step bootstrapping, and an Actor-Critic architecture to provide fast, stable training in a continuous action space.

The implementation of the D4PG contained in this project was influenced by the agent implemented by Mathew Doll in the link at the end.

Additional tweaks such as batch normalization prior to Tanh action and negative reward shaping were implemented with tweaked n step replay buffer to effectively handle the large amount of experiences generated by the 20 robotic arms

Installing the Agent

The environment is included in the repository with a Windows environment. The environmnet can also be downloaded from the following links for your operating system:
- Linux: click here
- Windows (64-bit): click here
Run an environment of python 3.6.8 natively or with the use of Anaconda
Using pip install the requirements.txt

Running the Agent

Simply start up the notebook and run each cell

Solving the Environment

A single agent was implemented using the D4PG architecture to solve the environment in a parallel manner where the 33 state variables for each of the 20 agents are processed by the neural network in a batched manner in a parallelized fashion taking full advantage of the GPU power available and allowing the timesteps to proceed faster.

The agent manages to solve the environment in record speed reaching a score greater than 30 in 16 episodes. To be honest, its the fastest I have seen.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Failed models		Failed models
Reacher_Windows_x86_64		Reacher_Windows_x86_64
checkpoints		checkpoints
images		images
.gitignore		.gitignore
Continuous_Control.ipynb		Continuous_Control.ipynb
README.md		README.md
agent.py		agent.py
main.py		main.py
memory.py		memory.py
models.py		models.py
noise.py		noise.py
requirements.txt		requirements.txt
seeds		seeds
train.py		train.py
utilities.py		utilities.py

Sanuja91/Udacity-Deep-Reinforcement-Learning-Project-2-Continous-Control

Folders and files

Latest commit

History

Repository files navigation

Udacity : Deep Reinforcement Learning Nanodegree

Project 2: Continuous Control (Reacher)

Implemented Agent

Installing the Agent

Running the Agent

Solving the Environment

About

Resources

Stars

Watchers

Forks

Languages