Project continued at the following link https://github.com/OpenJ92/online-Starcraft
A study of Starcraft 2 replay data in an attempt to identify strategy exterior to expert knowledge
see: WebS_.py and Sc.py - lines(19 - 60)
While the Starcraft 2 replay files are not found in this repository, The means by which one can retrieve them are. In these scripts, I use BeautifulSoup, selenium and requests among many other libraries to pull replays from https://gggreplays.com/ and https://lotv.spawningtool.com/. (See import commands at the top of each file for further details)
In total ~> 120,000 replay files spanning 5 years and 7 leagues, among many other metrics, were collected. For the purposes of this project A subset of replays, particularly those belonging to professional players, were used due not only for the time constraint, but due to the diverse range of player strategy displayed at that level.
see: (./ORM/ETL_.py and ./ORM/models.py) or (ETL.py)
Using the Flask-SQLAlchemy python framework a cyclic model (./ORM/models.py) of replay elements was constructed:
- (Players) have many (Events, Games)
- (Games) have many (Players, Events)
- (Events) have one (Player, Game)
Ultimately, this construction was a great hinderance to the project. Navigating such a graph was cumbersome at best and confusing at worst. In the light of this burden, I decided to (post submission) reconstruct the model in a star topology seen in the current (./ORM/models.py).
- (Users) have many (Participants)
- (Participants) have many (Events) have one (Game, User)
- (Games) have many (Participants)
- (Events) have one (Participant)
This intermediary object (Participant) works to simplify queries and relationships between (Games, Users, Events) and hosts a series of additional variable information from the previous Players class.
With the subset of replays committed to a SQLlite3 database, the raw information was then transformed into a sequential aggregate form.
ie. (game, participant, sequence, train_Marine, train_Marauder, build_Barracks, ...)
- a_1 = (100, 4, 0, 0, 0, 0, 0, ...)
- a_2 = (100, 4, 1, 0, 1, 0, 0, ...)
- a_3 = (100, 4, 2, 1, 0, 0, 0, ...)
- a_4 = (100, 4, 3, 0, 1, 0, 0, ...)
into (game, participant, action, train_Marine, train_Marauder, build_Barracks, ...)
- a_1 = (100, 4, 0, 0, 0, 0, 0, ...)
- a_2 = (100, 4, 1, 0, 1, 0, 0, ...)
- a_3 = (100, 4, 2, 1, 1, 0, 0, ...)
- a_4 = (100, 4, 3, 1, 2, 0, 0, ...)
figure above displays all Terran professional games (buildings constructed) notice the clear directionality of the tendrils.
to reflect the current state of the game for one of the two participants. Notice, with (game, participant, action) removed, the bulk can be considered a one dimensional curve in Rn whose rate with respect to order of action belongs to the hypercube Rn and |a_(n)| < |a_(n+m)| for all n and m belong to the Naturals.
see: ./ORM/PCA_ETL.py
For each games sequence of events, I preformed a Principal Component Analysis reduction of dimensionally -> R1 as a means to extract the first singular vector. This, by the definition of the first singular vector link_to_paper (Section 1.1), vector is a regressive representation of the direction of the propagation of events of each game. This was chosen as a representation not only for its speed and interpretability, but for its ability to capture the events for each game in its totality in a single vector. There are certainly disadvantages to this approach with a loss of information (High Bias) and lack of invertibility, but it suited the project goal well enough. I intend to construct a function which measures an 'arc' residual sum as means to interpret the 'goodness of fit' of each singular vector and its corresponding aggregate game events.
figure above displays inner product of Terran, Protoss and Zerg professional games as a measure of directional simmilarity. Notice that there are regions of orthogonal singular vectors reflecting different race units and structures.
see: ./ORM/unsupervised.py
Equipped with our singular vector representation for each game's events, I carried out unsupervised KMeans, with a Euclidian metric and GaussianMixture clustering on these singular vectors. With this algorithm, we were attempting to identify a collection of naturally occurring strategies in the game of Starcraft. I intend on trying several additional methods including a cosine similarity metric, which I believe will parse the singular vectors best according to an adjusted silhouette score metric.
Currently, the only clear assertion I can make is that I will continue to work on this project. Below is a collection of goals in a to-do list for the coming weeks. I did not achieve what I sought out to do in the problem statement of the project, but I am confident that with diligent work I will be able to.
see: ./ORM/unsupervised_cos.py
see: under construction
see: ML.py and ML_Sc.py and TreeBot.py
Weighted Vector Space on strategy to construct 'Newtonian Gravitational field' to make decision player_state -f-> action: (in Progress)
see: under construction
see: A way to capture known partial information of Player B Strategy (Estimate player B strategy) adjust own strategy accordingly ie change Weighted Vector Space. under construction