Skip to content

sinjax/trendminer-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

to run on a file: cat input | python twokenize.py | python langid.py | python stemming.py > output

ad you will get the same tweets with some extra fields in the json: tokens - list of tokens tok_lang - string with proper words separated by whitespace lang_det - the detected language of the tweet stemming - list of stems

Works at a rate of about 1 million tweets/hour , although it's likely to be actually faster

About

python trendminer code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages