Seq2seq code in PyTorch

Building from Ruotian Luo's code for captioning AND Sandeep Subramanian's seq2seq code

Data preprocessing:

I use these steps from Alexandre Bérard's code

> config/WMT14/download.sh    # download WMT14 data into raw_data/WMT14
> config/WMT14/prepare.sh     # preprocess the data, and copy the files to data/WMT14

Then run the following to save in h5 files:

> python scripts/prepro_text.py

Training:

Training requires some directories for saving the model's snapshots, the tensorboard events

> mkdir -p save events

To train a model under the parameters defined in config.yaml

> python nmt.py -c config.yaml

Check options/opts.py for more about the options.

To evaluate a model:

> python eval.py -c config

To submit jobs via OAR use either train.sh or select_train.sh

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
loader		loader
loss		loss
models		models
options		options
results		results
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
RUNNING		RUNNING
TODO		TODO
config		config
eval.py		eval.py
follow.sh		follow.sh
lig.sh		lig.sh
nmt.py		nmt.py
requirements.txt		requirements.txt
select_train.sh		select_train.sh
train.sh		train.sh

dsp6414/seq2seq

Folders and files

Latest commit

History

Repository files navigation

Seq2seq code in PyTorch

Data preprocessing:

Training:

About

Resources

Stars

Watchers

Forks

Languages