Skip to content

TwitterTrendDectection/TwitterTrendDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TwitterTrendDetection

Twitter Trend Detection

Project Structure

./codes/

./codes/pipeline.py

The pipeline python script to run the project

./codes/modules/

Different modules responsible for different procedure of personalized trend generation

  • config.py (all parameters like the input and output file name setting up here)
  • data_frame_preprocess.py (process json file to data_frame and filter out non-english tweets)
  • preprocess_nlp.py (all nlp methods used to preprocess tweets)
  • time_explore.py (scripts that detect the duration of tweets data in terms of hour)
  • background_model.py (generate statistic model for training data and testing data)
  • hot_words_generator.py (training and testing part to generate hotwords from training and testing data)
  • hotwords_statistic.py (script used to generate hotwords and tweets corresponding pairs)
  • group_burst.py (script that generate trends from hotwords with their corresponding tweet ids)
  • personalize.py (LDA algorithm to extract topic from user profile and test data)
  • recommend_tweets.py (recommend tweets for specific user based on similarity between their LDA results)

./codes/generateCSV

Java code used to flatten the crawled data from the web

./file/

Training data(csv), testing data(csv) and all the generated files The user profile we used are @JayZClassicBars, @KeyAndPeele, @realDonaldTrump, @taylorswift13.

The training and testing data we used are meaningly extracted from 2011, in terms of spliting data, training data must come before testing data because trends or events have orders.

./file/tweets

The original tweets crawled using twitter API

If you'd like to run the whole program, you will need to specify the file names in the config.py and also specify the training data and test data file names in the pipeline.py.

The whole program will run for several minutes and you can choose which part to run or not also in the pipeline.py.

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •