Skip to content

nikicc/slovene-nltk-tagger

 
 

Repository files navigation

About

In this project we will implement NLTK Taggers for Slovene language.

##Reqirements

For this tagger to work, you need Python 2.7 and NLTK.

##Usage

Unitl this taggers are build into NLTK, you can download the taggers from folder slovene_taggers/ and use them in NLTK.

The example, which shows how to use Slovene taggers, is in file example.py

Slovenian explanation of tags is in jos1M/josMSD-canon-sl.tbl

##Folders and files description

  • evaluation/ : outputs from evaluation script. graph.m is octave code for plotting evaluation results.

  • jos100k/ : Slovene corpus taken from JOS project with 100.000 tagged words.

  • jos1M/ : Slovene corpus taken from JOS project with million tagged words.

  • paper :the latex paper about this project

  • pos/jos1M.pos : this file is used as an input for trainer program from trainer/

  • slovene_taggers/ : the result of this project. Here are strored Slovene Taggers, which can be used in NLTK.

  • slides/ : presentation slides in Slovene

  • trainer/ : the code forked from https://github.com/japerk/nltk-trainer. This trainer is used to train the taggers.

  • evaluateTaggers.sh : commands for accuracy evaluation of the taggers.

  • evaluateTaggersSpeed.py : commands for measuring the time spent for tagging.

  • example.py : this example shows, how to use Slovene taggers in NLTK.

  • generateTaggers.sh : commands for generating the taggers. The generation uses data pos/jos1M.pos and program trainer/train_tagger.py.

  • transformJOS.py : the code for transforming all .xml corpuses from jos1M/ into pos/jos1M.pos.

About

Slovene NLTK tagger

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.9%
  • Shell 12.1%
  • MATLAB 2.0%