- testing data: 10,20,30,.....190,200 (不包含180,這篇是"請益",所以刪掉)
- training data: training/training_merge/中共193篇 - testing data
- ngram/
- total/1-7.txt : training/training_merge/中共193篇每一類的ngram
- total/1-7total.txt : 1-7.txt合起來的所有ngram
- total/1-8total.txt : 1-8.txt合起來的所有ngram
- train/1-7.txt : training/training_merge/中193-19=174篇每一類的ngram
- train/1-7total.txt : 1-7.txt合起來的所有ngram
- train/1-8total.txt : 1-8.txt合起來的所有ngram
- word_importance/ : 取每個類別中的字去算weight
- ex ("服務"在類別1的count / "服務"在全部corpus裡(1-8total.txt)的count ) * ("服務"在類別1的count / 類別1中所有的word count數)