here I build up a chinese text classification with package jieba for tokenization and sklearn for ML classification algorithm
I write my code from the basic code from http://blog.sina.com.cn/s/blog_7e5f32ff0102w9ll.html I add up more parts for the text classification: to be added....
- chinese tokenization