Skip to content

philshem/zuerich_speaks

Repository files navigation

Text mining 100+ years of Kanton Zürich's referenda and initiatives

TWIST2018 project

team

*Peter has some nice papers with previous research

main data sources:

  • https://opendata.swiss/de/dataset/abstimmungsarchiv-des-kantons-zurich

  • Kantonal level CSV contains URLs to machine-readable pdf voting information

  • Gemeinde level CSV contains per-Gemeinde historical voting records

  • CSVs are joined by unique vote ID (STAT_VORLAGE_ID)

  • PDF are converted to TXT via pdftotext and can be joined to CSV files by field ABSTIMMUNGSTAG

using the code and data

(mostly python 2.7 or bash)

  • get_pdfs.py scrapes the URLs from the Kantonal CSV file and saves them locally. (Actually we got the PDFs from the organizers on a usb stick, because the scraper was getting IP blocked.) Note that the files Bundesamt.pdf are not URL linked in the CSV files.

  • convert_pdf_to_txt.sh loops over the PDFs and converts them to TXT with pdftotext.

  • read_txt.py reads the individual TXT files, cleanups up the text a bit, and writes a CSV file with some keys for joining later: full_text.csv (zipped).

  • vote_mapping.py (experimental) reads the combined text from full_text.csv, and also the metadta from the Kantonal CSV file. It attemps to split the TXT file into multiple elements, one for each ballot measure, using some file-specific some keywords. The code then maps based on the rank of this split array. Output file is full_text_mapped.csv.

  • sentiment.py reads full_text_mapped.csv and calculates the polarity (-1,1), the subjectivity (0,1) with textblob_de and the readability. Output file is full_text_mapped_sentiment.csv, and the three scores are added as the last 3 columns.

voting

Releases

No releases published

Packages

No packages published