RepEval-2016

Here contain the scripts and code used in Repeval 2016 paper:
Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

API Package

word2vec: original word2vec from Mikolov (https://code.google.com/archive/p/word2vec/)
wvlib: lib to read word2vec file (https://github.com/spyysalo/wvlib)

Scripts

createRawText.sh: download file for creating raw corpus
createCorpus.sh: Pre-process text (input: raw corpus directory)
createModel.sh: Create word2vec.bin file with different window size
intrinsicEva.sh: run intrinsic evaluation on 8 benchmark data-set (input: Dir. for testing vector)
ExtrinsicEva.sh: run extrinsic evaluation

Code

Pre-processing:
tokenize_text.py: tokenized text (need NLTK installed)
sentence_spliter.py: segment sentence

Intrinsic evaluation:
evaluate.py: perform intrinisic evaluation

Extrinsic evaluation: (Keras folder: Need either tensorflow or theano installed):
mlp.py: simple feed-forward Neural Network
setting.py: parameters for the Neual Network

Remark

https://drive.google.com/open?id=0BzMCqpcgEJgic0ttWTlyLWZOSVk

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
keras/ner		keras/ner
tools		tools
wikiextractor		wikiextractor
word2vec		word2vec
wvlib		wvlib
ExtrinsicEva.sh		ExtrinsicEva.sh
README.md		README.md
createCorpus.sh		createCorpus.sh
createModel.sh		createModel.sh
createRawText.sh		createRawText.sh
evaluate.py		evaluate.py
intrinsicEva.sh		intrinsicEva.sh
sentence_splitter.py		sentence_splitter.py
tokenize_Text.py		tokenize_Text.py

jdeepee/RepEval-2016

Folders and files

Latest commit

History

Repository files navigation

RepEval-2016

API Package

Scripts

Code

Remark

About

Resources

Stars

Watchers

Forks

Languages