Skip to content

54wang17/rnn_w2v

Repository files navigation

Dependency:

Data:

  • train.txt (test.txt) one sentence per line
  • train_label.txt(test_label.txt) corresponding label for the sentence in train.txt (test.txt)
  • *.p is generated by program.

File:

  • preprocess_data.py:

    • generate corpus based on train.txt and test.txt.
    • generated index for each word in the corpus
    • transform each word in the sentence to its corresponding index.
    • output file: corpus.p
  • word_embed.py

    • use Google pretrained word2vec model
    • find word embedding vector for each word in the corpus.p file, if no vector is found, randomly generated a vector for that word
    • change word embedding mapping key from word to its index in corpus.p file
    • output file: word2vec.p (this word embedding model only contains words appear in corpus.p)
  • bacis_rnn.py

  • vanilla_rnn.py

    • based on baisc_rnn.py and vanilla rnn model from https://github.com/gwtaylor/theano-rnn
    • extend BasicRNN model from basic_rnn.py and make following changes
      • change parameters update approach to momentum
      • add L1 and L2 regulation to cost function
      • add bias on layer function
  • basicRNN_w2v.py

    • an example of training basic rnn model and save training model under ./data directory
  • model_test.py

    • an example of loading pretrained rnn model and test model with test data from ./data directory
    • generate evaluation matrix for performance evaluation
  • gru_rnn.py

    • GRU model
    • support mini batch training.
      • things to notice, when loading data with mini batch, take care of last batch size, it may smaller than the assigned batch size.
  • gruRNN_w2v.py

    • an example of training gru rnn model with/without minibatch

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published