Statistics

This repo contains statistics computed after analyzing data from twitter.

The statistics will be used to create visualizations, further analysis and for machine learning models.

######################################### dataExtraction.py - run this on your json files (what you got from API calls). This will extract relevant info from the json tweets.

    python dataExtraction.py [JSON_FILE] > [my_file]

The output for each tweet will be of this form (a JSON encoded list).

  [user_id, text, fav_count, retweets, index, date, hashtags]

segregationByUsers.py and segregationByDate.py runs on the [my_file] and clubs the tweets based on data/user.

top_hashtags.py runs on the [JSON_FILE] to compute top hashtags in the file.

pop_words.py runs on the [my_file] and compute top n most frequent words in the file.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
AFINN.txt		AFINN.txt
README.md		README.md
TextGraph.py		TextGraph.py
communities.py		communities.py
countTweetsByUser.py		countTweetsByUser.py
dataExtraction.py		dataExtraction.py
extract_tweets.py		extract_tweets.py
kmeans.py		kmeans.py
pop_words.py		pop_words.py
remove_dups.py		remove_dups.py
segregatingByDate.py		segregatingByDate.py
segregatingByUsers.py		segregatingByUsers.py
sentiment.py		sentiment.py
sentiment_by_date.csv		sentiment_by_date.csv
sentiment_by_tweet.csv		sentiment_by_tweet.csv
smalldata.txt		smalldata.txt
smalltweets.txt		smalltweets.txt
top_hashtags.py		top_hashtags.py

slick2/Statistics

Folders and files

Latest commit

History

Repository files navigation

Statistics

About

Resources

Stars

Watchers

Forks

Languages