DRLGP

Deep Reinforcement Learning Game Playground

Deterministic vs. nondeterministic In a deterministic game, the course of the game depends only on the players' decisions. In a nondeterministic game, an element of randomness is involved, such as shuffling cards. Go game is deterministic,typically
Perfect information vs. hidden information In perfect information games, both players can see the full game state at all times; the whole board is visible. In hidden information games,each player can see only part of the game states. JunQi, a popular Chinese board game, is typical for opponent'pieces are kept unkonwn in the game process.

DRLGP is a platform to help developing board game agents strengthened with tree search methods. As one of examples, AlphaGo,which defeated human top Go player in 2016,is a reinforcement learning and tree search based bot.

On DRLGP, deterministic and perfect information board game bots will be developed first, and then hidden information game bot.

Basic Algorithms

Minimax

Minimax is a decision rule used in artificial intelligence, decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. When dealing with gains, it is referred to as "maximin"—to maximize the minimum gain. Originally formulated for two-player zero-sum game theory,covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.

Alpha-Beta pruning

Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are checkd by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player games (Tic-tac-toe, Chess, Go, etc.). It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be checkd further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision

Monte Carlo Tree Search

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in game play. MCTS has been used for decades in computer Go programs.It has been used in other board games like chess and shogi,games with incomplete information such as bridge and poker,as well as in real-time video games (such as Total War: Rome II's implementation in the high level campaign AI.

UCT
ResNet

Implementation notes

pytorch 1.0+

Usage

To run the train pipeline, modify the config.ini to setup your favoriate parameters and then launch the entry point in the root folder:
```
 python   main.py
```
To play the game, run the following command to launch the web server
```
 python   ./web/server.py 
```
and then input the 127.0.0.1:5000/static/index.html into the web browser to open the game page.

References

[1] Silver et al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.”

[2] Silver et al. “Mastering the game of go without human knowledge.” Nature 550.7676 (2017): 354–359.

[3] Max Pumperla and Kevin Ferguson."Deep Learning and the Game of Go"

[4] Maxim Lapan."Deep Reinforcement Learning Hands-On"

[5] 宋俊潇,AlphaZero实战：从零学下五子棋

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
.vscode		.vscode
agent		agent
boardencoder		boardencoder
common		common
config		config
game		game
models		models
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datacollector.py		datacollector.py
main.py		main.py
muzero.py		muzero.py
policychecker.py		policychecker.py
policyimprover.py		policyimprover.py
requirements.txt		requirements.txt
simpleMuZero.py		simpleMuZero.py
test.py		test.py

License

luxunxiansheng/DRLGP

Folders and files

Latest commit

History

Repository files navigation

DRLGP

Table of Contents

Introduction

Basic Algorithms

Implementation notes

Usage

References

License

About

Resources

License

Stars

Watchers

Forks

Languages