Skip to content

dengwc/wmt16-scripts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scripts for Edinburgh Neural MT systems for WMT 16

This repository contains scripts and an example config used for the Edinburgh Neural MT submission (UEDIN-NMT) for the shared translation task at the 2016 Workshops on Statistical Machine Translation (http://www.statmt.org/wmt16/).

The scripts will facilitate the reproduction of our results, and serve as additional documentation (along with the system description paper)

OVERVIEW

SCRIPTS

  • preprocessing : preprocessing scripts for Romanian that we found helpful for translation quality. we used the Moses tokenizer and truecaser for all language pairs.

  • sample : sample scripts that we used for preprocessing, training and decoding. We used mostly the same settings for all translation directions, with small differences in vocabulary size. Dropout was enabled for EN<->RO, but disabled otherwise.

  • r2l : scripts for reranking the output of the (default) left-to-right decoder with a model that decodes from right-to-left.

LICENSE

The scripts are available under the MIT License.

PUBLICATIONS

The Edinburgh Neural MT submission to WMT 2016 is described in:

TBD

It is based on work described in the following publications:

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, Proceedings of the International Conference on Learning Representations (ICLR).

Rico Sennrich, Barry Haddow, Alexandra Birch (2015): Neural Machine Translation of Rare Words with Subword Units. arXiv preprint.

Rico Sennrich, Barry Haddow, Alexandra Birch (2015): Improving Neural Machine Translation Models with Monolingual Data. arXiv preprint.

About

scripts and configuration files for Edinburgh neural MT submission to WMT 16 shared translation task

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 56.4%
  • Python 43.6%