Recurrent Neural Net Ensembles for Complex Word Identification (SemEval-2016)

# Install passage for easy NN manipulation
# See https://github.com/IndicoDataSolutions/Passage
sudo pip install passage

# Train your model, e.g.
# $ python train.py cwi_input cwi_labels <embedding size> <size of gated recurrent layer> <no. of epochs>
python train.py cwi_inputs.txt cwi_labels.txt 10 10 10
# And two files will appear; (i) the RNN model, (ii) the tokenizer learnt from the training data.
ls stubborn_model.gridsearch.embedding100.gru10.epoch10.pkl
ls cwi_inputs.txt-tokenizer.pkl

# Best parameters (from our experiments: 
# (<embedding size>, <size of gated recurrent layer>, <no. of epochs>)
# [(10,10,10), (10,50,10), (50,200,10), (100,10,10), (200,20,10)]

# If you would like to do some parameter search, see paramsearch.py
nohup paramsearch.py cwi_input.txt cwi_labels.txt > paramsearch.log &

# Predict from the test data.
python stubborn_test.py cwi_test.txt stubborn_model.gridsearch.embedding10.gru10.epoch10.pkl cwi_test.txt-tokenizer.pkl

Cite

Nat Gillin. 2016. Neural Nonsense Mangled in Ensemble Mess. In SemEval-2016. (forthcoming)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
CWI-data		CWI-data
other-data		other-data
LICENSE		LICENSE
README.md		README.md
baseline.out		baseline.out
cwi_inputs.lemmatized.txt		cwi_inputs.lemmatized.txt
cwi_inputs.txt		cwi_inputs.txt
cwi_labels.txt		cwi_labels.txt
cwi_test.lemmatized.txt		cwi_test.lemmatized.txt
cwi_test.txt		cwi_test.txt
ensemble.out		ensemble.out
ensemble.test		ensemble.test
ensemble.train		ensemble.train
paramsearch.py		paramsearch.py
prepare_ensemble_input.py		prepare_ensemble_input.py
process_test_data.py		process_test_data.py
process_train_data.py		process_train_data.py
pywsdlemmatizer.py		pywsdlemmatizer.py
test.py		test.py
train.py		train.py
xgboost_ensemble.py		xgboost_ensemble.py

License

alvations/stubboRNNess

Folders and files

Latest commit

History

Repository files navigation

Recurrent Neural Net Ensembles for Complex Word Identification (SemEval-2016)

Cite

About

Resources

License

Stars

Watchers

Forks

Languages