Overview

This is a project for Chinese Tokenization.

Usage

Usage of segment_sentences:

  python segment_sentences.py [options] [arg]

Options:

  -h, --help            show this help message and exit

  -d, --debug           print the debug information of the segmentation,
                      default is not

  -f FILE, --file=FILE  segment sentences from the specified file

  -i, --interactive     go into interactive mode

  -o OUT, --out=OUT     write the segment result into the specified file

  -s SEPARATOR, --separator=SEPARATOR
                      specified the separator of the segmentation result

  -t TRAIN, --train=TRAIN
                      use the training set to train the algorithm

  -v, --version         output version info and exit

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
icwb2-data		icwb2-data
.gitignore		.gitignore
Node.py		Node.py
README.md		README.md
Segment.py		Segment.py
evaluator.py		evaluator.py
icwb2-data.rar		icwb2-data.rar
input_sentence.txt		input_sentence.txt
pku_training.utf8		pku_training.utf8
pro_dict.py		pro_dict.py
pro_dict.txt		pro_dict.txt
segment_sentences.py		segment_sentences.py
sen_words.txt		sen_words.txt
test_set.txt		test_set.txt
train_seg.txt		train_seg.txt

DevilCry/chinesetokenization

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

About

Resources

Stars

Watchers

Forks

Languages