GitHub - tomhttp/keyword_extractor: keyword extractor (or tag extractor)

Algorithms for keyword extraction.

tfidf_rank
- features: TF, IDF (==1)
- ranking: TF * IDF
text_rank
- features: pos (for filtering), word neighbors
- ranking: TextRank, which like PageRank, while building a relation matrix according to the words' positions.
- reference: http://www.cse.unt.edu/~rada/papers/mihalcea.emnlp04.pdf
glm_rank (TODO)
- features: word frequence (TF), word importance (IDF, Part-Of-Speech, Entity type), word position (such as whether both in title and body)
- ranking: train and predict by regression model.
semantic_rank (TODO)
- features: such as TF, IDF, POS, entity type ...
- ranking: regression model plus SemanticRank, which like Pagerank, while building a relation matrix according to semantic similarity.
- re-ranking: document category based adjusting, task dependent word adjusting.
topic_rank (TODO)
- ranking: topic model, such as LDA.

##Requirements:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
tagger		tagger
trie		trie
util		util
README.md		README.md