RLSim is a reinforcement learning simulator, created specifically for comparing agent behaviors to animal behaviors. The simulator runs RL agents through maze environments. Given animal behavior data as input, the simulator will train the agent according to given data.
This project was developed to compare the behavior of learning agents in a maze environment to that of animals, using data from neuroscience research. The simulator attempts to recreate the situation of the animal for the learning agent. For example, neurological equipment can cause certain movements to become more difficult.
git clone git@github.com:v2tamprateep/RLSim.git
To run an agent through a maze, returning the paths taken per episode:
python learn.py --algo <agent> --mazes <maze_file> --output <file_path>
To train an agent using animal behavioral data:
python tether.py --algo <agent> --mazes <config_file> --input <file> --output <file_path>
Examples of maze layout, MDP, and maze config files can be found in their respective folders.
Automated Q-value resets on indicated episodes
- episodes can be manually specified
- Q-values can be reset every n episodes where n is a parameter.
Maze swapping
- user can specify a series of mazes (domains) for the agent to act in along with the number of episodes the agent will spend in each maze.
Update functions
- the simulator supports four variations on the Q-values update functions, taking into account exploration bonuses and diminishing rewards.
Agent orientation
- the simulator tracks the agent's orientation, this allows for the option of penalizing backward movement more than forward movement (for simulating difficulty in different movements).
Flag | Description |
---|---|
--algo | specify name of RL algorithm |
--mazes | maze config file |
-s, --samples | number of samples (learn.py only) |
--mdp | name of .mdp file |
-a, --alpha | learning rate |
-g, --gamma | discount value |
-e, --epsilon | epsilon-greedy parameter |
-l, --learning | agent's update function |
-b, --back_cost | penalization for backward movement |
-R, --reward | reward for finishing maze |
-d, --deadend_cost | penalty for reaching a deadend |
-q, --Qreset | interval at which qvalues are reset (learn.py only) |
-o, --output | output file |