Skip to content

Sanuja91/Udacity-Deep-Reinforcement-Learning-Project-2-Continous-Control

Repository files navigation

Udacity : Deep Reinforcement Learning Nanodegree

Project 2: Continuous Control (Reacher)

Trained Agent

For this project, we had to build an AI agent to conduct a continous task for keeping the robotic hand (blue nob) in zone of the moving target with the aim of reaching a score greater than 30 for 100 episodes in a row. An episode constists of 1000 timesteps. In each timestep the agent receives a reward of +0.01 for staying in the zone. The robotic arm can be moved by adjusted 4 continous actions that determine the force exerted on the joints of the robotic arm thereby determining its location in space. The actions range in between the values of -1 and +1. The selected environment for this project was the more challenging 20 robotic arm environment

Implemented Agent

This project uses the latest state of the art D4PG agent algorithm to reach the start score. The agent is pretty sophisticated using distributed training, distributional value estimation, n-step bootstrapping, and an Actor-Critic architecture to provide fast, stable training in a continuous action space.

The implementation of the D4PG contained in this project was influenced by the agent implemented by Mathew Doll in the link at the end.

Additional tweaks such as batch normalization prior to Tanh action and negative reward shaping were implemented with tweaked n step replay buffer to effectively handle the large amount of experiences generated by the 20 robotic arms

Installing the Agent

  1. The environment is included in the repository with a Windows environment. The environmnet can also be downloaded from the following links for your operating system:

  2. Run an environment of python 3.6.8 natively or with the use of Anaconda

  3. Using pip install the requirements.txt

Running the Agent

Simply start up the notebook and run each cell

Solving the Environment

A single agent was implemented using the D4PG architecture to solve the environment in a parallel manner where the 33 state variables for each of the 20 agents are processed by the neural network in a batched manner in a parallelized fashion taking full advantage of the GPU power available and allowing the timesteps to proceed faster.

The agent manages to solve the environment in record speed reaching a score greater than 30 in 16 episodes. To be honest, its the fastest I have seen.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published