TicTacToe ML

Simple pure python q-learning algorithm implementation

Disclaimer

This repository just a sample was written to my presentation at ML Meetup. You may take ideas, but it's not intended to run in production environment.

Reinforcement Learning

Is a kind of unsupervised ML where agent communicating with environment and learns to take actions to earn best result Optionally some curiosity behavour may be implemented to give agent a chance to explore best strategy.

Q-Learning is a alghoritm (aka function) creating policy which tells agent what action can be performed in each environment state. Because agent can't get score (and recaculate policy) immediatly after each action (move) it should "remember" all moves and states in the round and calculate gradient of scores

About TicTacToe implementation

TicTacToe is a pure python implementation of Q-Learnig alghoritm. Used just for POC.
Because of not big number of unique combinations it stores all seen "positions" in python dictionary.
The key is a features vector build out of 3x3 squire board and packed to single integer.
The value is a gradiented score related to board state
Some curiosity implemented, so algoritm can "improve" it's strategy during learning

Installing and testing

To test algorithm simple game.py script included in this project
Installing as simple as install new uwsgi process if you know exactly what you do. Check ttt.ini for more info.

Warning change user/group of running process. Running as root may make you system potencially vulnerable

Clone repository
Point nginx /tictactoe location to the root of cloned repository
Configure /tictactoe/ai location as proxy_pass to http://127.0.0.1:9092
Make data directory writebale for the script user
Run run.sh in tmux session or any other way you like
Open in browser http://[yourdomain|localhost]/tictactoe

Note: Google login may not work from your domain till you change google-signin-client_id meta in index.html

Nginx configuration example

    location ~/tictactoe
    {
    	root /var/www;
	    index index.html;
    }
    
    location ~/tictactoe/ai*
    {
        error_log  /var/log/nginx/tictactoe.error.log error;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_pass http://127.0.0.1:9092;
    }

Results

We played with this alghoritm couple of hours. It was very impressive to see how algo learns and becomes a real player from kiddy randomizer After about 1000 rounds the algo plays good enouph but still has many loses.

Now it plays much better. You can try it by your self and Play DEMO

License

MIT license . You may use it for any purpose without warranty

Further learning

wikipedia Reinforcement_learning

wikipedia Q-learning

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
lib		lib
translations		translations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
game.py		game.py
index.html		index.html
run.sh		run.sh
tto.png		tto.png
ttt-screen.png		ttt-screen.png
ttt.ini		ttt.ini
ttx.png		ttx.png

License

vt77/tictactoe

Folders and files

Latest commit

History

Repository files navigation

TicTacToe ML

Disclaimer

Reinforcement Learning

About TicTacToe implementation

Installing and testing

Results

License

Further learning

About

Resources

License

Stars

Watchers

Forks

Languages