Allen Chen, Jonathan Ko, Young Kim
We obtain our dataset by scraping the game results listings from teamliquid. A script is provided in scraper/main.rb
that parses the listings pages and outputs to csv
.
Data collection is very time-consuming due to the fact that the teamliquid site throttles requests; too many requests in a short time will result in a temporary (~1 minute) IP ban. To fix this we introduce an inter-request delay of 10 seconds.
Usage: ./main.rb [options] [section]
-p, --page N Start at page N
-d, --delay D Inter-request delay (seconds)
-h, --help Display this message
We use data from two sections, sc2-international
and sc2-korean
.