debate-parser

Contains Python code to do the following:

parse debate transcripts,
create structured datasets,
generate word clouds from semi-structured debate transcripts,
score the sentiments of the debate text
produce a divergent chart for summarizing the sentiments

The accompanying articles with the entire output can be found at the following blog post: http://ml4ma.blogspot.com/

The raw data used for each of the debates are from the following sources:

https://www.washingtonpost.com/news/the-fix/
http://www.cbsnews.com/news/transcript-sixth-republican-top-tier-debate-2016/ The individual links to each original data source and created datasets are listed in the blogposts.

Run the following command to get the list of speakers from the raw debate transcript:

python get_speakers.py <input raw data> <output csv file>

To generate the word clouds run the following command,

python main.py data/dem_debate1 dem_debate.csv

The word clouds are generated in images directory.

To generate sentiments using patternanalyzer and naive bayes, use the following command,

python combineDebateTranscripts.py

Acknowledgements go to nmoya who wrote a whatsapp parser https://github.com/nmoya/whatsapp-parser, which helped me quickly come up with a modified parser for other kinds of semi-structured data.

The divergent chart is inspired by the following chart about fact checking: http://www.datarevelations.com/all-politicians-lie-some-more-than-others

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
__pycache__		__pycache__
data		data
images		images
javascript		javascript
output		output
sentiments		sentiments
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
Merge_sentiments.py		Merge_sentiments.py
README.md		README.md
TextBlobSentiment.py		TextBlobSentiment.py
all_debates.csv		all_debates.csv
all_debates_NBsent.csv		all_debates_NBsent.csv
all_debates_PAsent.csv		all_debates_PAsent.csv
all_debates_all_sents.csv		all_debates_all_sents.csv
all_debates_sent.csv		all_debates_sent.csv
all_debates_snlpsent.csv		all_debates_snlpsent.csv
benghazi_analysis.py		benghazi_analysis.py
calculatepercentages.py		calculatepercentages.py
clustering.R		clustering.R
combineDebateSentiments.py		combineDebateSentiments.py
combineDebateTranscripts.py		combineDebateTranscripts.py
dem_debate2.py		dem_debate2.py
dem_debate2_distinctive_words.py		dem_debate2_distinctive_words.py
dem_debate3.py		dem_debate3.py
dem_debate3_snlp_sent.csv		dem_debate3_snlp_sent.csv
figure.png		figure.png
get_speakers.py		get_speakers.py
main.py		main.py
randomforest.R		randomforest.R
read_hearing.py		read_hearing.py
rep_debate1_textblob_sent.csv		rep_debate1_textblob_sent.csv
rep_debate3_final.csv		rep_debate3_final.csv
rep_debate4_output.txt		rep_debate4_output.txt
rep_debate_sentiment.csv		rep_debate_sentiment.csv
rep_debates.csv		rep_debates.csv
repub_debate.py		repub_debate.py
repub_debate5.py		repub_debate5.py
sentimentByTextBlob.py		sentimentByTextBlob.py
transcript.py		transcript.py
transcript.pyc		transcript.pyc
transcript_links.txt		transcript_links.txt
wordfreq.csv		wordfreq.csv

gtadiparthi/debate-parser

Folders and files

Latest commit

History

Repository files navigation

debate-parser

About

Resources

Stars

Watchers

Forks

Languages