This repository contains the parsing code described in Section 3.1 of Three Cheers For Partisanship [1]. It is used to parse the raw transcripts of presidential primary debates available from the American Presidency Project [2].
To get the code, simply clone the repo into a local directory:
git clone https://github.com/ethanroday/three-cheers/
To fetch and parse the debate transcripts, run the following from the src/
directory:
python TranscriptFetcher.py
python TranscriptParser.py
The parsed transcripts will be output data/debates/parsedTranscripts
.
This code takes dependencies on the following libraries, all of which can be installed using pip
:
requests
(pip install requests
)BeautifulSoup
(pip install beautifulsoup4
)jsonschema
(pip install jsonschema
)nltk
(pip install nltk
)
[1] Roday, Ethan. Three Cheers For Partisanship: Lexical Framing and Applause in U.S. Presidential Primary Debates. Master's thesis, University of Washington, 2017.
[2] Peters, Gerhard and Woolley John T. The American Presidency Project, 1999-2017. http://www.presidency.ucsb.edu/debates.php.