Skip to content

An adaptive algorithm, which should abstract temporally extended actions online, without the need for additional background information (besides a Markovian description of the environment). Several Reinforcement Learning algorithms where embedded in a Hierarchy of policies, among which n-step QL, Expected Sarsa, LSTM neural networks (for Q value…

Notifications You must be signed in to change notification settings

daddabarba/NHRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(Adaptive) Neural Hierarchical Reinforcement Learning

This project presents an implementation of a Hierarchical Reinforcement Learning (HRL) algoritim, where state-action value functions are learned by a Long-Short Term Memory (LSTM) Artificial Neural Network (ANN).
The proposed architecure tries to adaptively fit (online) a hierarchy of abstract action, completely bottom-up, without the need to add any background information. This allows for a fast and simple use of HRL, with the added possibility of re-using the learned procedural knowledge.

In the branch paper_code you can find the code in the same state as it was, when the paper (relative to this project) was written.
In <repository>/testing/tests/ you can find the data collected and used in the paper. Finally, the script <repository>/resultsTesting/extractData.r contains the data manipulation procedure, used to infer on the collected results and perform statistical tests on them.

Python

This project was built on python 3.5.2. In order to run the following packages are needed:

  • numpy 1.14.15
  • tensorflow 1.10.0
  • matplotlib 2.2.2 (optional, for GUI)
  • pip (for package checking)

Running the project

Before running any code from the project, please make use of the module import helper in <repository>/import_dir. Go to <repository>/import_dir, and run source ./addPaths.sh. The changes made by this script should hold as long as the terminal is not closed and the command source ./removePaths.sh is not executed. To know more about its usage, you can visit its original repo.

Agent

To run the agent (manually), go to <repository>/simulation/simModel/agent, then run the script load_agent.py. Add the option -i to keep the script running after execution, otherwise it will close as soon as the agent is loaded.

python3 -i load_agent.py

You can also add a series of parameters to specify how the agent should be loaded. Each parameter is given to the script as a couple key value

python3 -i load_agent.py {key value}

The following options are available:

  • key: loop, values: True,False, default: False
    if this option is set to True, then the script will require you to input the number of time-steps you want the agent to act. Once this is done, the program will keep asking you again a new number of time-steps (submitting nothing will result in the previous value being used). To stop the loop input 0.
    This option is most useful if you just want the agent to act and observe its behavior
  • key: GUI, values: True, False, default: False
    if this option is set to True, then the GUI will also be loaded. With the GUI you can observe the maze and the agent's current position and past visited locations.
  • key: noPrint, values: True, False, default: False
    if this option is set to True, then the agent object will print no statement (no output)
  • key: path, values: path/to/file, default: None
    if this option is left to value None, then the default environment will be loaded. If a path to a file is specified, then the latter will be used as maze (environment) for the agent. The file must be a CSV file, removing the header tags (see <repository>/simulation/files/maze.txt for reference)

These options can pe used in any order and number (eg. python3 -i load_agent.py loop False GUI True).

Running the experiment

Average R per time-step

To run an experiment, go to <repository>/testing/, and run the python script testing_avgR.py.

In this experiment a number n of time-steps is specified. Then the agent will perform n actions, and the average reward at each time-step is recorded.

This script will require some parameters to be specified:

  • key: name, values: file/name
    the name of the folder in which to store the experiment's results. The results will be found in <repository>/tasting/tests/<name>.
  • key: n, values: <integer>
    the number of time-steps, that is the number of actions the agent is left to do
  • key: e, values: <integer>
    the number of times the experiment has to be repeated.
  • key: maze, values: path/to/file
    the path to the file defining the environment (maze map). Set to def to leave default maze.
  • key: pars, values: path/to/file
    path to json file containing the parameters the agent should use (such as learning rate for instance)
  • key: origin, values: path/to/file
    path to json file containing an experiment's parameter setting (n, e, name, ecc...)

If any of these parameters is not given when lunching the script, they will be asked (as input) during the script's run.

After giving to the experiment a name of your liking, say experiment_1, you will find the results in the folder <repository>/testing/tests/experiment_1_n. Here n is the lowest free integer such that the name experiment_1_n is not in the folder <repository>/testing/tests.

Restart experiment

To run an experiment, go to <repository>/testing/, and run the python script testing_restart.py.

The experiment run is very simple. Given a maze, the agent will start from the middle, and have to reach one of the two exits. An iteration consists of the agent reaching an exit from the starting point. After an iteration is complete, the agent has a visa, that is a given number of time-steps, before it is pulled back to the starting point. The number of time-steps required to complete an iteration is what is recorded.

This script will require some parameters to be specified:

  • key: name, values: file/name
    the name of the folder in which to store the experiment's results. The results will be found in <repository>/tasting/tests/<name>.
  • key: v, values: <integer>
    the visa size, that is, how long to wait (in time-steps) before the agent is pulled back to the starting point (after the exit is found).
  • key: n, values: <integer>
    the number of iterations, that is the number of times the agent has to find the exit, and thus the number of times it is pulled back to the starting point.
  • key: e, values: <integer>
    the number of times the experiment has to be repeated.
  • key: maze, values: path/to/file
    the path to the file defining the environment (maze map). Set to def to leave default maze.
  • key: pars, values: path/to/file
    path to json file containing the parameters the agent should use (such as learning rate for instance)
  • key: origin, values: path/to/file
    path to json file containing an experiment's parameter setting (n, e, name, ecc...)

If any of these parameters is not given when lunching the script, they will be asked (as input) during the script's run.

After giving to the experiment a name of your liking, say experiment_1, you will find the results in the folder <repository>/testing/tests/experiment_1_n. Here n is the lowest free integer such that the name experiment_1_n is not in the folder <repository>/testing/tests.

About

An adaptive algorithm, which should abstract temporally extended actions online, without the need for additional background information (besides a Markovian description of the environment). Several Reinforcement Learning algorithms where embedded in a Hierarchy of policies, among which n-step QL, Expected Sarsa, LSTM neural networks (for Q value…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published