Skip to content

Techniques to provide keyword tags for Stack Exchange questions

Notifications You must be signed in to change notification settings

CS585NLP/Keywordtagger

Repository files navigation

Keywordtagger

dependencies scikitlearn

The pipeline is expected to work as follows
1.unclean data ---->[text_cleaner.py]---->cleandata
2.cleandata ------->[SplitTest.py]-----> [test,dev,train] datasets
3.Model:- bayes_unigram.py
3.1 [traindataset]--------->feature.py-------->[features for trainset]
3.2 [features for trainset]------->[trained model]
3.3 [testdataset]---------->feature.py-------->[test features]
3.4 [features for testset]------->classify [predicted result]

observations:

Tags are sparse.[1000 tags]. so model finds it hard to predict

##How to run. if necessary change the input dataset in Splittest.py , run it then run bayes_unigram.py

About

Techniques to provide keyword tags for Stack Exchange questions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages