Skip to content

hamedkalantari/TextClassification

Repository files navigation

Text-Classification

Python library for Persian text classification.

Text cleaning

Sentence and word tokenizer

Creating a vector index of each words in a sentence and all the text

Creating frequency matrice

Create term frequency and inverse document frequency matrice

Also use L2 normalizer to create an unit vector in matrices

Using:

Just pass the XML in the right format, in function create_tf_idf in the run.py file.

run.py file is the entry point for the project.

Dependencies:

hazm library:

pip install hazm

scikit-learn library:

pip install -U scikit-learn

Scipy library:

pip install Scipy

Numpy library:

pip install numpy

About

Text Classification Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages