Skip to content

DevilCry/chinesetokenization

 
 

Repository files navigation

Overview

This is a project for Chinese Tokenization.

Usage

  • Usage of segment_sentences:

      python segment_sentences.py [options] [arg]
    
  • Options:

      -h, --help            show this help message and exit
    
      -d, --debug           print the debug information of the segmentation,
                          default is not
    
      -f FILE, --file=FILE  segment sentences from the specified file
    
      -i, --interactive     go into interactive mode
    
      -o OUT, --out=OUT     write the segment result into the specified file
    
      -s SEPARATOR, --separator=SEPARATOR
                          specified the separator of the segmentation result
    
      -t TRAIN, --train=TRAIN
                          use the training set to train the algorithm
    
      -v, --version         output version info and exit
    

About

chinesetokenization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.0%
  • Perl 29.0%