GitHub

Dependency:

check requirements.txt for dependecy
Need to download google word2vec pretrained model from https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM and unzip it to data/ directory .

Data:

train.txt (test.txt) one sentence per line
train_label.txt(test_label.txt) corresponding label for the sentence in train.txt (test.txt)
*.p is generated by program.

File:

preprocess_data.py:
- generate corpus based on train.txt and test.txt.
- generated index for each word in the corpus
- transform each word in the sentence to its corresponding index.
- output file: corpus.p
word_embed.py
- use Google pretrained word2vec model
- find word embedding vector for each word in the corpus.p file, if no vector is found, randomly generated a vector for that word
- change word embedding mapping key from word to its index in corpus.p file
- output file: word2vec.p (this word embedding model only contains words appear in corpus.p)
bacis_rnn.py
- Basic RNN class
- made several changes based on the RNN model in https://github.com/dennybritz/rnn-tutorial-rnnlm
vanilla_rnn.py
- based on baisc_rnn.py and vanilla rnn model from https://github.com/gwtaylor/theano-rnn
- extend BasicRNN model from basic_rnn.py and make following changes
  - change parameters update approach to momentum
  - add L1 and L2 regulation to cost function
  - add bias on layer function
basicRNN_w2v.py
- an example of training basic rnn model and save training model under ./data directory
model_test.py
- an example of loading pretrained rnn model and test model with test data from ./data directory
- generate evaluation matrix for performance evaluation
gru_rnn.py
- GRU model
- support mini batch training.
  - things to notice, when loading data with mini batch, take care of last batch size, it may smaller than the assigned batch size.
gruRNN_w2v.py
- an example of training gru rnn model with/without minibatch

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
data		data
.Python		.Python
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
basicRNN_w2v.py		basicRNN_w2v.py
basic_rnn.py		basic_rnn.py
gruRNN_w2v.py		gruRNN_w2v.py
gru_rnn.py		gru_rnn.py
model_test.py		model_test.py
pip-selfcheck.json		pip-selfcheck.json
preprocess_data.py		preprocess_data.py
requirements.txt		requirements.txt
utils.py		utils.py
vanilla_rnn.py		vanilla_rnn.py
word_embed.py		word_embed.py

54wang17/rnn_w2v

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages