Skip to content

fernandopso/twitter-svm-tfidf.py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter data mining with Python

Build Status

Using Support Vector Machine and Term Frequency–Inverse Document Frequency in three steps:

  1. Collect many tweets from Twitter
  2. Classify some tweets with positive, negative or neutral
  3. Predict others tweets

System dependencies

sudo apt-get install build-essential python-dev python-setuptools \
                     python-numpy python-scipy libblas-dev gfortran \
                     libatlas-dev libatlas3gf-base liblapack-dev \
                     libatlas-base-dev

If you use Python 3

sudo apt-get install python3-minimal

Install Packages

Use pip with virtualenv

pip install -r requirements.txt

Configuration

The Natural Language Toolkit provide human language data (over 50 corpora and lexical resources) in different languages and formats as twitter samples, RSLP Stemmer (Removedor de Sufixos da Lingua Portuguesa), complete work of Machado de Assis for Brazilian Portuguese language and much more.

For download all corpora

python -m nltk.downloader all

Or download the corpora of your choice from Python Interpreter

>>> import nltk
>>> nltk.download()

A new window should open, showing the NLTK Downloader.

Credentials

Set your Twitter credentials from Twitter Application Manager for variables: CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN and ACCESS_TOKEN_SECRET.

Run tests

python -m unittest discover

Start

Run the Human-Machine Interface

python hmi.py

Example

Collect

collect

Listing collected tweets

tweets

Classification

training

Predication

prediction

Roadmap

About

Twitter data mining with Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages