Skip to content

Datasets and code for a scientific paper about video game text corpora. Datasets contain text from Star Wars: Knights of the Old Republic (Bioware), TorchLight II (Runic Games) and The Elder Scrolls (Bethesda Softworks).

Notifications You must be signed in to change notification settings

hmi-utwente/video-game-text-corpora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

video-game-text-corpora

Data and code for a paper about video game text corpora.

Datasets

  • Torchlight II quest texts: quest dialogue, main quest summaries and GUI text in CSV-format.
  • Star Wars: Knights of the Old republic: branching player and NPC dialogue in CSV-format.
  • The Elder Scrolls (Arena, Daggerfall, Morrowind, Oblivion, Skyrim and The Elder Scrolls: Online): in-game books in JSON-format.

Code

Each game-folder has a src/ folder that contains the code for creating the dataset. It should give some insight in how the data was extracted.

For TorchLight II and SW:KOTOR: before you can run the code, you should have access to the original game files from which the data was extracted.

Scientific paper

This repository is for the research paper Fantastic Strings and Where to Find Them: The Quest for High-Quality Video Game Text Corpora, to appear in the proceedings of INT 2020. Preprint version of the paper. If you use the data or code, please cite the following paper:

@inproceedings{vanstegeren2020fantastic,
    title = "{Fantastic Strings and Where to Find Them: The Quest for High-Quality Video Game Text Corpora}",
    author = {van Stegeren, Judith and Theune, Mari{\"e}t},
    booktitle = "Intelligent Narrative Technologies Workshop",
    month = oct,
    year = "2020",
    publisher = {AAAI Press},
}

Games

The corpora were extracted from three commercial video games. The games and the game assets are copyright the respective game publishers and game developers. If you use the datasets, don't forget to cite the games too!

@misc{game:starwarsknightsoftheoldrepublic,
title = {\emph{Star Wars: Knights of the Old Republic}},
year = {2003},
organization = {LucasArts},
publisher = {LucasArts},
author = {{BioWare}},
Howpublished = {Game [PC]},
Note = {LucasArts, San Francisco, US},
}

@misc{game:torchlight2,
title = {\emph{Torchlight II}},
year = {2012},
organization = {Runic Games},
publisher = {Runic Games},
author = {{Runic Games}},
Howpublished = {Game [PC]},
Note = {Runic Games, Seattle, Washington, US},
}

@misc{gamesseries:tes,
title = {\emph{The Elder Scrolls I-V} and \emph{The Elder Scrolls Online}},
date = {1994/2014},
year = {1994--2014},
organization = {Bethesda Softworks},
publisher = {Bethesda Softworks},
author = {{Bethesda Softworks}},
Howpublished = {Game series [PC]},
Note = {Bethesda Softworks, Rockville, Maryland, US},
}

About

Datasets and code for a scientific paper about video game text corpora. Datasets contain text from Star Wars: Knights of the Old Republic (Bioware), TorchLight II (Runic Games) and The Elder Scrolls (Bethesda Softworks).

Topics

Resources

Stars

Watchers

Forks

Languages