Reacher

Solution to the second project of Udacitys Reinforcement Learning Nanodegree

Quick Installation

To set up the python environment and install all requirements for using this repository on Linux OS, follow the instructions given below:

Create and activate a new environment with Python 3:

python3 -m venv /path/to/virtual/environment
source /path/to/virtual/environment/bin/activate

Clone the Udacity repository and navigate to the python/ folder to install the unityagents-package:

git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .
cd -

Download the Unity-environment from Udacity and unzip it:

wget https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/Reacher_Linux.zip
unzip Reacher_Linux.zip
rm Reacher_Linux.zip

For using this repository using Windows OS or Mac OSX, please follow the instructions given . Afterwards download the Unity-environment for , or and unzip it in this repository. If run under one of these operating systems, the path in the class Env in envwrapper.py has to be adjusted in such a way, that it points to the corresponding executable.

Quick Start

After installing all requirements and activating the virtual environment training the agent can be started by executing

python main.py

Configurations like enabling the visualization or adjustments to the architecture is possible through the dictionaries defined in main.py. During training a file called performance.log is created, which holds information about the current score and the average score of the last 100 episodes. Furthermore, if the agents variable save_after is set to a value larger than 0, after the given number of epochs the agents current parameters will be saved in a file with the agents name plus _parameters.dat, while the current weights of the agents models will be saved in a file with the agents name plus _actor.model and _actor_target.model as well as _critic.model and _critic_target.model.

Running

python evaluate.py

allows to evaluate the performance of the saved agent of the given name.

Background Information

In this environment the player (or agent) has to control double-jointed arms in such a way, that they follow a given goal location. The aim is to maintain the arms position at the target location for as long as possible. In every timestep in which the agent manages to do so, he gets a reward of +0.04. At every timestep, the player is provided a 33-dimensional state (with information about position, rotation, velocity and angular velocity of the arm) per arm and has to decide on the actions to take. For every arm, the corresponding action-vector consists of 4 real numbers in the range from -1 to +1. After a given time (1000 timesteps) the game is over, hence making this task an episodic one. The environment is considered solved when the player achieves an average score of +30 over 100 consecutive episodes, where the score is given as the average of the scores achieved by all arms in a single iteration.

For more information on the approach that was used to solve this environment, see Report.md.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
README.md		README.md
Report.md		Report.md
Results.png		Results.png
agent.py		agent.py
envwrapper.py		envwrapper.py
evaluate.py		evaluate.py
main.py		main.py
model.py		model.py
performance.log		performance.log
reacher_actor.model		reacher_actor.model
reacher_actor_target.model		reacher_actor_target.model
reacher_critic.model		reacher_critic.model
reacher_critic_target.model		reacher_critic_target.model
reacher_parameters.dat		reacher_parameters.dat
utils.py		utils.py
video.gif		video.gif

fberressem/Reacher

Folders and files

Latest commit

History

Repository files navigation

Reacher

Quick Installation

Quick Start

Background Information

About

Resources

Stars

Watchers

Forks

Languages