pMADI

Markov decision process and reinforcement learning

Introduction

In this project we consider an agent that has to move through a labyrinth, this labyrinth is composed of different tiles which are described in the following table.

Tile name	Tile symbol	Description
Starting position	●
Empty room	(blank)
Wall	∎	Cannot move there
enemy	E	the agent kill the enemy with a probability of 0.7 or dies
Trap	▲	the agent dies (p 0.1), goes back to the starting position (0.3), or nothing happens
Crack	C	Immediate death
Treasure	T	Can be picked up
Sword	†	Can be picked up
Key	K	Can be picked up, opens the treasure
Portal	◯	The agent is teleported in a random room of the labyrinth
Moving Platform	-	The agent is moved to one of the adjacent tiles

Example of an instance of the labyrinth :

To complete the labyrinth the agent has to find a key, find the treasure that can be opened by this key and finally go back to its starting position. In addition the agent can also pick up a sword, which allows it to kill enemies instantly.

The goal of this project is to find the optimal policy so that the agent can achieve his objective, to do so we modeled this game as a Markovian decision process where the states are defined as the following (pos_x, pos_y, has_key, has_sword, has_treasure), and the possible actions are to go up, down, left or right. as for the transitions between the states, we calculate this value in function of different tiles where the agent can end up on after a given movement.

The different methods used for the optimisation are the following

Value Iteration
Policy iteration
Q Learning

The descriptions of the algorithms used to implement these can be found in the full report for this project in madi.pdf (french only)

Usage :

Just run main.py, requires numpy, all the features are available there.

Original project by :

Gualtiero Mottola
Alexandre Bontems

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
instances		instances
.gitignore		.gitignore
README.md		README.md
cli.py		cli.py
game.py		game.py
labyrinth.png		labyrinth.png
linear_programming.py		linear_programming.py
madi.pdf		madi.pdf
main.py		main.py
player.py		player.py
player2.py		player2.py
policy_iteration.py		policy_iteration.py
qlearning.py		qlearning.py
solver.py		solver.py
solver_main.py		solver_main.py
value_iteration.py		value_iteration.py

gualt1995/pMADI

Folders and files

Latest commit

History

Repository files navigation

pMADI

Introduction

Usage :

Original project by :

About

Resources

Stars

Watchers

Forks

Languages