- clone the repository
git clone git@github.com:bgshin/cnntweets.git
- make virtual env
mkvirtualenv sent
- Dependencies
pip install -r requirements.txt
- Python 2.7
- requirements
- boto==2.40.0
- bz2file==0.98
- gensim==0.12.4
- numpy==1.11.0
- protobuf==3.0.0b2
- requests==2.10.0
- scipy==0.17.1
- six==1.10.0
- smart-open==1.3.3
- tensorflow==0.8.0
-
WITHOUT pre-trained w2v
-
Train
cd cnn nohup python cnn_train.py > out.txt &
-
Test
-
Modify cnn/cnn_test.py
savepath = 'model_path/model-xxxx'
-
Run test script
cd cnn python cnn_test.py
-
-
-
WITH pre-trained w2v
-
Download and extract the compressed file to have the pre-trained w2v bin file
-
Modify w2v_cnn/cnn_train.py
model_path = 'path_to_w2v_bin/word2vec_twitter_model.bin'
-
Train
cd w2v_cnn nohup python cnn_train.py > out.txt &
-
Test
-
Modify w2v_cnn/cnn_test.py
savepath = 'model_path/model-xxxx'
-
Run test script
cd w2v_cnn python cnn_test.py
-
-
-
Dev (semeval16_T4A_devtest_npo)
- number of data: 1588
-
Tst (semeval16_T4A_test_npo)
- number of data: 20632
-
Trn (semeval13_T2B_16T4A_train_dev_npo)
- number of data: 15385
-
Data files
- semeval13_T2B_16T4A_train_dev_devtest_npo - 1588+15385 = 16973
- semeval16_T4A_devtest_npo = 1588
- semeval13_T2B_16T4A_train_dev_npo = 15385
- semeval16_T4A_dev_npo = 1595
- semeval16_T4A_test_npo = 20632
- semeval16_T4A_train_npo = 4796
-
Format of data (TAB separated)
no sentiment sentences 1 objective I may be the ... 2 positive TGIF folks! ...
- Label definition
- 'objective': [0, 1, 0], 1
- 'positive': [0, 0, 1], 2
- 'negative': [1, 0, 0], 0
- Trained on 400 million tweets
- Resources