Word2Vec2NLP

Using Word2Vec for sentiment analysis

requirements

pip install -r requirements.txt

If installation fails, or errors occur when running the application, following this to install virtualenv and install the requirements in the virtual environment. Recommended

Directory structure and data preparation
- create a directory called save in root
- if you want to run similarity.py, download the google news corpus from this, and put it to directory dataset
Run the following application scripts, based on your interest.

wordvector.py: uses Word2Vec to train and compute word vectors for our corpus
- The corpus file is in dataset/all
- The model is saved in save/model.bin
docvector.py: extracts feature vectors using four algorithms, and save them to directory save
- there four algorithms are:
  - Word Averaging
  - Word Averaging + SentiWordNet
  - NLPF (Feature sets extracted in assignment 5.4)
  - NLPF + SentiWordNet
  - For each algorithms, six feature vectors are produced, as we are to perform 3-fold cross validation, and we have a feature set and a test set for each
classification.py: apply three machine learning classifiers to the feature vectors produced by docvector.py and print performance metrics for each combination of classifier and feature vector
- Random Forest
- SVM
- Naive Bayes
similarity.py: uses the word vectors computed by wordvector.py, and predefined vectors trained from Google News Dataset to find the 10 most similar words for the following keywords:
- apple
- iphone
- ipad
- ipod
- mac
- macbook

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
dataset		dataset
reference		reference
results_magic		results_magic
.gitignore		.gitignore
README.md		README.md
classification.py		classification.py
documents.py		documents.py
docvector.py		docvector.py
docvector_parallel_Deprecated.py		docvector_parallel_Deprecated.py
mputil.py		mputil.py
requirements.txt		requirements.txt
sentimentclassification_Deprecated.py		sentimentclassification_Deprecated.py
sentimentclassification_rf_Deprecated.py		sentimentclassification_rf_Deprecated.py
setting.py		setting.py
similarity.py		similarity.py
test.py		test.py
util.py		util.py
wordvector.py		wordvector.py