Implementations to supplement my reading of "Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew Barto.
Ch 2 - Multi-armed Bandits
Ch 4 - Dynamic Programming
Ch 5 - Monte Carlo Methods
Ch 6 - Temporal-Difference Learning
Ch 7 - Multi-step Bootstrapping
Ch 8 - Planning and Learning with Tabular Methods
dqn - My own experiments in using q-networks to achieve generalization in more complex environments
environments - A collection of environment (fMDP) implementations. These environments extend OpenAI Gym's Environment class.