Skip to content

Regime classification using a topic modeling approach

Notifications You must be signed in to change notification settings

ahalterman/regimeClassif

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mining Texts to Efficiently Generate Global Data on Political Regime Types

Authors

Shahryar Minhas, Jay Ulfelder, & Michael D. Ward

Abstract

We describe the design and results of an experiment in using text-mining and machine-learning techniques to generate annual measures of national political regime types. Valid and reliable measures of country's forms of national government are essential to cross-national and dynamic analysis of many phenomena of great interest to political scientists, including civil war, interstate war, democratization, and coup d'état. Unfortunately, traditional measures of regime type are very expensive to produce, and observations for ambiguous cases are often sharply contested. In this project, we train a series of support vector machine (SVM) classifiers to infer regime type from textual data sources. To train the classifiers we used vectorized textual reports from Freedom House and the State Department as features for a training set of pre-labeled regime type data. To validate our SVM classifiers, we compare their predictions in an out-of-sample context and the performance results across a variety of metrics (accuracy, precision, recall) are very high. The results of this project highlight the ability of these techniques to contribute to producing real time data sources for use in police science that can also be routinely updated at much lower cost than human-coded data. To this end, we set up a text processing pipeline that pulls updated textual data from selected sources, conducts feature extraction, and applies supervised machine learning methods to produce measures of regime type.

Replication Instructions

All necessary data to replicate study is stored in a Dropbox folder. Output data used to generate plots shown in paper is stored in this Dataverse.

About

Regime classification using a topic modeling approach

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TeX 55.4%
  • Python 30.1%
  • R 14.4%
  • Shell 0.1%