Skip to content
/ tdttt Public

Temporal difference reinforcement learning for the game of Tic-Tac-Toe

Notifications You must be signed in to change notification settings

fumin/tdttt

Repository files navigation

Reinforcement learning AI for the game of tic-tac-toe

This project implements the SARSA and Q-learning algorithms. The training method is to bootstrap two random agents playing against themselves. During game play, the agents do not know nor do they need the rules and heuristics of tic-tac-toe. For example, these agents do not know that states with only one move left is equivalent to an end game. In fact, throughout the training process the only factors affecting the agents are:

  • the definition of win and lose of tic-tac-toe, i.e. the existence of horizontal, vertical, or diagonal lines
  • and the hyperparameters of the learning algorithm

The following graph shows the reward per episodes of both SARSA and Q-learning algorithms during training:

Running this code

To run this code, run python main.py --algo sarsa to start a SARSA training session, and python main.py --algo qlearning for a Q-learning one. For both these cases, after training, a terminal interface will be provided to play with these trained agents.

Run tests

To run the tests of this code, run python testing.py.

About

Temporal difference reinforcement learning for the game of Tic-Tac-Toe

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages