GitHub - jonathandunn/political_classification: Code from "Profile-based authorship analysis"

This repository contains feature extraction code for the paper "Profile-based Authorship Analysis."

The dataset is provided here: https://s3.amazonaws.com/jonathandunn/Legislative_Texts.zip

The Vectorizers in the 'data' folder were trained on speeches from the US House and US Senate, Canadian House, and European Parliament, along with misc. political speeches (all in data set).

This produces X, y feature vectors with or without part-of-speech tags. The "ITFIDF" file produces TF-IDF transforms while the "RAW" file produces frequency counts.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
!OLD		!OLD
data		data
modules		modules
README.md		README.md
create_df.py		create_df.py
fit_vectorizer.py		fit_vectorizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

!OLD

!OLD

data

data

modules

modules

README.md

README.md

create_df.py

create_df.py

fit_vectorizer.py

fit_vectorizer.py

Repository files navigation

About

Releases

Packages

Languages

jonathandunn/political_classification

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages