Python library for Persian text classification.
Text cleaning
Sentence and word tokenizer
Creating a vector index of each words in a sentence and all the text
Creating frequency matrice
Create term frequency and inverse document frequency matrice
Also use L2 normalizer to create an unit vector in matrices
Just pass the XML in the right format, in function create_tf_idf in the run.py file.
run.py file is the entry point for the project.
hazm library:
pip install hazm
scikit-learn library:
pip install -U scikit-learn
Scipy library:
pip install Scipy
Numpy library:
pip install numpy