Keywordtagger

dependencies scikitlearn

The pipeline is expected to work as follows
1.unclean data ---->[text_cleaner.py]---->cleandata
2.cleandata ------->[SplitTest.py]-----> [test,dev,train] datasets
3.Model:- bayes_unigram.py
3.1 [traindataset]--------->feature.py-------->[features for trainset]
3.2 [features for trainset]------->[trained model]
3.3 [testdataset]---------->feature.py-------->[test features]
3.4 [features for testset]------->classify [predicted result]

observations:

Tags are sparse.[1000 tags]. so model finds it hard to predict

##How to run. if necessary change the input dataset in Splittest.py , run it then run bayes_unigram.py

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
Bayes_unigram.py		Bayes_unigram.py
LDA_preprocess.py		LDA_preprocess.py
README.md		README.md
SplitTest.py		SplitTest.py
TextCleaner.py		TextCleaner.py
confusion		confusion
dataviz.py		dataviz.py
eval.py		eval.py
feature.py		feature.py
full_mode1_1.py		full_mode1_1.py
full_model_2.py		full_model_2.py
full_model_2_test.py		full_model_2_test.py
lda_gensim.py		lda_gensim.py
load.py		load.py
measurement		measurement
slangs.txt		slangs.txt

CS585NLP/Keywordtagger

Folders and files

Latest commit

History

Repository files navigation

Keywordtagger

observations:

About

Resources

Stars

Watchers

Forks

Languages