GitHub - yucca43/NNproject

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
trees		trees
wordvectors		wordvectors
.gitignore		.gitignore
README		README
preprocess-twitter.rb		preprocess-twitter.rb
rnn.py		rnn.py
rnn2deep.py		rnn2deep.py
rnn2tanh.py		rnn2tanh.py
rnn_changed.py		rnn_changed.py
rntn.py		rntn.py
rntn.sh		rntn.sh
run.sh		run.sh
run2.sh		run2.sh
run2_35.sh		run2_35.sh
run2_45.sh		run2_45.sh
run2tanh.sh		run2tanh.sh
runNNet.py		runNNet.py
sgd.py		sgd.py
test.sh		test.sh
transformToRnnTree.py		transformToRnnTree.py
tree.py		tree.py
wordMap.bin		wordMap.bin

Repository files navigation

DATA:
1. Sentiment 140 Data
	only has sentence level sentiment label
2. SemEval 2013 from york.ac.uk
	combine task a and task b to provide some phrasal sentiment to use RecursiveNN

Possible RNN models:

RNN 1 baseline
RNN 2
RNN 2 with dropout
RNTN 
think about what improvement you can bring to the RNN2 model

Use:
Preprocess Twitter data
Glove pretrained word vectors?


Installing jpype:
export JAVA_HOME="/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/"


Naive RNN
	binarized, but possible to have only left node
	also, Only have labels for sentence node

Done:
1. Built tree structure with Stanford Parser, used CollinsBinarization with careless NPCG

2. Seperated train data into 4874 train and 1218 dev

3. change activation from relu to tanh didn't work..rnn2tanh

4. Run rntn.. struggling...

5. load pretrain data

Possible further improvement
-preprocess twitter data & use pretrained word vectors
Question: preprocess before parsing or after parsing?
-incorporate task A data
-Run it on sentiment140?
-take those word not in pretrained as UNK?


Everytime you get new train/dev trees:
	first, build new wordmap by doing python tree.py