RNNsearch

An implementation of RNNsearch using Theano, the implementation is the same with GroundHog

Usage

Data Preprocessing

Build vocabulary

Build source vocabulary

python scripts/buildvocab.py --corpus zh.txt --output vocab.zh.pkl
                             --limit 30000 --groundhog

Build target vocabulary

python scripts/buildvocab.py --corpus en.txt --output vocab.en.pkl
                             --limit 30000 --groundhog

Shuffle corpus (Optional)

python scripts/shuffle.py --corpus zh.txt en.txt

Build Dictionary (Optional)

If you want to use UNK replacement feature, you can build dictionary by providing alignment file

python scripts/build_dictionary.py zh.txt en.txt align.txt dict.zh-en

Training

  python rnnsearch.py train --corpus zh.txt.shuf en.txt.shuf
    --vocab zh.vocab.pkl en.vocab.pkl --model nmt --embdim 620 620
    --hidden 1000 1000 1000 --maxhid 500 --deephid 620 --maxpart 2
    --alpha 5e-4 --norm 1.0 --batch 128 --maxepoch 5 --seed 1234
    --freq 1000 --vfreq 1500 --sfreq 50 --sort 20 --validate nist02.src
    --ref nist02.ref0 nist02.ref1 nist02.ref2 nist02.ref3

Decoding

  python rnnsearch.py translate --model nmt.best.pkl < input > translation

Sampling

  python rnnsearch.py sample --model nmt.best.pkl < input > examples

UNK replacement

  python rnnsearch.py replace --model nmt.best.pkl --text input translation
    --dictionary dict.zh-en > newtranslation

Resume training

  python rnnsearch.py train --model nmt.autosave.pkl

Convert Trained Models

Models trained by GroundHog can be converted to our format using convert.py, only support RNNsearch architecture

python scripts/convert.py --state search_state.pkl --model search_model.npz
                          --output nmt.pkl

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
metric		metric
model		model
nn		nn
optimizer		optimizer
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
rnnsearch.py		rnnsearch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

metric

metric

model

model

nn

nn

optimizer

optimizer

scripts

scripts

utils

utils

.gitignore

.gitignore

README.md

README.md

rnnsearch.py

rnnsearch.py

Repository files navigation

RNNsearch

Usage

Data Preprocessing

Build Dictionary (Optional)

Training

Decoding

Sampling

UNK replacement

Resume training

Convert Trained Models

About

Releases

Packages

Languages

xnlp/RNNsearch

Folders and files

Latest commit

History

Repository files navigation

RNNsearch

Usage

Data Preprocessing

Build Dictionary (Optional)

Training

Decoding

Sampling

UNK replacement

Resume training

Convert Trained Models

About

Resources

Stars

Watchers

Forks

Languages