Skip to content

riyazbhat/rungsted

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rungsted structed perceptron sequential tagger

Building

Use

python setup.py build_ext --inplace

Building with the above command happens in place, leaving the generated C and C++ files in the source directory for inspection. Changes in dependent modules are unfortunately not picked up by the build system. Whenever you need to start from a clean slate, use the supplied clean.sh script to get rid of the generated files.

The build script requires a recent version of Cython. If you don't have Cython, it can be installed as below:

pip install cython

Demo

The repository contains a subset of the part-of-speech tagged Brown corpus. To run the structured perceptron labeler on this dataset, execute:

python src/labeler.py --train data/brown.train --test data/brown.test.vw

Rungsted's input format is closely modeled on the powerful and flexible format of Vowpal Wabbit, with the exception that Rungsted is perfectly fine with labels that are not integers.

Datasets

Provided you have a working installation of NLTK, you can recreate the Brown dataset with this command.

python rungsted/datasets/cr_brown_pos_data.py data/brown.train.vw data/brown.test.vw

There is also a script rungsted/datasets/conll_to_vw.py to convert from CONLL-formatted input to Rungsted

About

Fast structured perceptron sequential labeler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.1%
  • C++ 9.0%
  • C 1.3%
  • Shell 0.6%