Skip to content

thinkmpink/disaster-tracker

 
 

Repository files navigation

disaster-tracker

Study print media rhetoric relating to the Syrian refugee crisis. Run one of the _ex.py files to get started.

  • Corpus Parser: Saves JSON articles in a specified directory as Strings.

  • Unigram Stats: Counts the number of appearances of each word in a corpus.

    • unigram_stats_ex.py path/to/archive/ path/to/stop-word-list
  • N-gram Stats: Counts the number of appearances of any phrase in a corpus.

    • ngram_stats_ex.py path/to/archive/ path/to/stop-word-list (n-gram length)
  • Proximate Unigrams: Lists the words near a given word in a corpus.

    • proximate_words_ex.py path/to/archive/ path/to/stop-word-list (offset distance) (word to look for)
  • Proximate N-grams: List the phrases near a given word in a corpus in order of frequency.

    • proximate_ngrams_ex.py path/to/archive/ path/to/stop-word-list (n-gram length) (word to look for) (offset)
  • Naive Sentiment:

    • naive_sentiment_ex.py path/to/archive/ path/to/stop-word-list (n-gram length) (word to look for) (offset)
  • Aggregate Attributes: Lists the number of words with a given attribute in the corpus in order of frequency.

    • attribute_agg_ex.py path/to/archive/ path/to/stop-word-list
  • Attribute Dictionary: For each attribute, lists the number of words in a corpus with that attribute.

    • attribute_agg_ex.py path/to/archive/ path/to/stop-word-list attribute

A few notes:

  • There must be files ./lexicons/positive-words.txt and ./lexicons/negative-words.txt to run aive Sentiment
  • A stop word list must be present. Many such lists are available online.
  • The Harvard Inquirer Excel file must be saved as a .csv and be in ./lexicons/inquirerbasic.csv

About

track and analyze natural and political crisis on social media and news

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%