A novel segment-based event detection system for tweets.
- install MongoDB on gpu server
- install the tweets data in to mongodb
- get the alpha result just use the microsoft n-gram
- install the tweets data into mongodb
- get the microsoft n-gram user token
- add the length normalizaion factor
- multi-thread to do the computation and http request
- the tokenizer just use str.split is not flexible, try to find a good tokenizer for tweet