Reinforcement Learning Algorithms Implementations

KTH Reinforcement Learning (EL2805) 2019 coding assignments. As all my other repos, this is more an exercice for me to understand the algorithms than useful code. Hope it also helps you!

LAB 1

Dynamic Programming in finite fully-observable stochastic MDP

Agent (green) escaping (blue) a maze with walls (black) with a monster (red) following a uniform random walk capable of crossing walls: code

Value Iteration in infinite fully-observable stochastic MDP

Agent (green) robbing banks (blue) while escaping a police (red) which follows a random walk, never going away from him: code

SARSA (following epsilon-greedy policy) in infinite non-observable stochastic MDP

Policy learned by the agent for every Police (red) position: code

Q-Learning (from uniform policy) in infinite non-observable stochastic MDP

Agent (green) robbing again banks (blue) while escaping a police (red) who follows a random walk: code

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
homework		homework
lab_0		lab_0
lab_1		lab_1
lab_2		lab_2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

homework

homework

lab_0

lab_0

lab_1

lab_1

lab_2

lab_2

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Reinforcement Learning Algorithms Implementations

LAB 1

Dynamic Programming in finite fully-observable stochastic MDP

Value Iteration in infinite fully-observable stochastic MDP

SARSA (following epsilon-greedy policy) in infinite non-observable stochastic MDP

Q-Learning (from uniform policy) in infinite non-observable stochastic MDP

About

Releases

Packages

Languages

tvjoseph/RL-algorithms

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Algorithms Implementations

LAB 1

Dynamic Programming in finite fully-observable stochastic MDP

Value Iteration in infinite fully-observable stochastic MDP

SARSA (following epsilon-greedy policy) in infinite non-observable stochastic MDP

Q-Learning (from uniform policy) in infinite non-observable stochastic MDP

About

Resources

Stars

Watchers

Forks

Languages