The project aims to modify existing collaborative filtering based recommendation techniques to use content-based information from the data.
Stack Overflow
Each data point is associated with a question body and associated tags
-
preprocessor.py: Extract keywords from the questions body to a sqlite database. Calculate tf-idf statistic for the keywords.
-
stopwords.py: Outputs a csv dataset with stopwords removed using tf-idf scores
-
tagcloud.py: Generate tag-clouds from the keyword database.
-
autoencoder.py:
-
doc2vec.py:
-
MatrixFactoriztion.py:
-
testingtraining.py: