Skip to content

Christof93/uzh-corpuslab-morphological-segmentation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Morphological Segmentation

This repository contains the source code for canonical morphological segmentation presented in Tatyana Ruzsics and Tanja Samardzic "Neural Sequence-to-sequence Learning of Internal Word Structure". In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada.

Installation

The code uses SGNMT framework and depends on the Blocks and srilm-swig libraries. Follow the SGNMT instructions to install these dependencies. The implementation also relies on the adapted version of Z-MERT. After installation update the enviromental variables LD_LIBRARY_PATH,PYTHONPATH, PATh in the header of the main executable Main.sh file with the location of swig and SRILM.

Running new experiments

The main executable is Main.sh:

Main.sh AbsolutePATHtoDATA AbsolutePATHtoWorkingDir ResultsFolderName NMT_ENSEMBLES BEAM USE_LENGTH_CONTROL

Running the experiments in the paper

The data folder contains the datasets for canonical segmentation.

Main.sh "Absolute path to /data/canonical-segmentation/indonesian/" "Absolute path to a working dir" results 5 12 -l
Main.sh "Absolute path to /data/canonical-segmentation/german/" "Absolute path to a working dir" results 5 12 -l
Main.sh "Absolute path to /data/canonical-segmentation/english/" "Absolute path to a working dir" results 5 12 -l

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.4%
  • Java 18.8%
  • Shell 0.8%