combine the following methods by giving them different weights
method 1 use vocabulary frequency method 2 use vocabulary type method 3 use difference of different txt method 4 use tf-idf
Some tricks:
- use use human body parts as features; not completed
- remove all words which contains number and alpha;
- remove all single word;