Skip to content

xnlp/RNNsearch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNNsearch

An implementation of RNNsearch using Theano, the implementation is the same with GroundHog

Usage

Data Preprocessing

  1. Build vocabulary
  • Build source vocabulary
python scripts/buildvocab.py --corpus zh.txt --output vocab.zh.pkl
                             --limit 30000 --groundhog
  • Build target vocabulary
python scripts/buildvocab.py --corpus en.txt --output vocab.en.pkl
                             --limit 30000 --groundhog
  1. Shuffle corpus (Optional)
python scripts/shuffle.py --corpus zh.txt en.txt

Build Dictionary (Optional)

If you want to use UNK replacement feature, you can build dictionary by providing alignment file

python scripts/build_dictionary.py zh.txt en.txt align.txt dict.zh-en

Training

  python rnnsearch.py train --corpus zh.txt.shuf en.txt.shuf
    --vocab zh.vocab.pkl en.vocab.pkl --model nmt --embdim 620 620
    --hidden 1000 1000 1000 --maxhid 500 --deephid 620 --maxpart 2
    --alpha 5e-4 --norm 1.0 --batch 128 --maxepoch 5 --seed 1234
    --freq 1000 --vfreq 1500 --sfreq 50 --sort 20 --validate nist02.src
    --ref nist02.ref0 nist02.ref1 nist02.ref2 nist02.ref3

Decoding

  python rnnsearch.py translate --model nmt.best.pkl < input > translation

Sampling

  python rnnsearch.py sample --model nmt.best.pkl < input > examples

UNK replacement

  python rnnsearch.py replace --model nmt.best.pkl --text input translation
    --dictionary dict.zh-en > newtranslation

Resume training

  python rnnsearch.py train --model nmt.autosave.pkl

Convert Trained Models

Models trained by GroundHog can be converted to our format using convert.py, only support RNNsearch architecture

python scripts/convert.py --state search_state.pkl --model search_model.npz
                          --output nmt.pkl

About

An implementation of RNNsearch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%