big-pgn-analyzer

Process extremely large chess PGN files

Beware, this project is very rough at the moment!

Working, sort of:

Split arbitrarily large PGN files into smaller files based on moves played, player rating, time control, etc.
Create a nested directory structure containing the sub-PGNs, with the directory structure representing the tree of moves played.
Find and merge transposing positions.
Create Sankey flow diagrams of chess openings.

Known issues:

Goals:

Cleanup ugly and/or inefficient code.
Improve workflow/usability.
Add engine analysis to positions.
Integrate engine analysis with data from the PGN such as win probability to generate opening repertoires.

Current Workflow:

(Currently optional) Use sanitize.py to "clean" a large PGN database (removes annotations, commentary, engine analysis, etc to greatly reduce filesize and make processing easier.)
If a randomish sample from a large PGN db is desired, creat the sample with sample.py
If one opening in particular is of interest, create a PGN of games with only that opening using get_opening.py.
Use split.py to iteratively split a PGN out by opening moves.
If needed to free up storage space, remove the sub-PGNs with del_pgns.py
Create charts with sankey.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
README.md		README.md
chessfx.py		chessfx.py
del_pgns.py		del_pgns.py
enginefx.py		enginefx.py
folder.py		folder.py
get_opening.py		get_opening.py
openingbook.py		openingbook.py
pgndb.py		pgndb.py
pgnfx.py		pgnfx.py
prune.py		prune.py
remove_incomplete.py		remove_incomplete.py
repertoire.py		repertoire.py
sample.py		sample.py
sanitize.py		sanitize.py
sankey.py		sankey.py
sankeyfx.py		sankeyfx.py
split.py		split.py

chicknblender/big-pgn-analyzer