PROJECT DESCRIPTION: This project attempts to create a recommender system based of the research of Julian McAuley.
This is julian McAuley's unmodified topic modeling code. It's written in C++. It takes as input a compressed (.gz) file containing a set of reviews. It prints to the terminal the top found words in all 5 topics. Before runnign it, make sure to export the following: "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/liblbfgs-1.10/lib/.libs/". Compile with 'make'.
To run, use: ./train .
Input file is .gz format. Each category node has a .gz file which is all the reviews of every product in that category.
Format of .gz: a) Each line is a review for a product b) Each line contains several fields separated by space.
An example input is is provided: Arts.votes.gz.
2. subCategoryFinder.py and it's dependency jsonToHash.py in the directory 'scripts/subCategoryFinder':
This is the code that creates the subCategory tree structure. It takes as input a file of reviews, its metadata, the threshold of how many reviews should be included on a subset of reviews to be considered a category and how many sub-category files should be outputed from the tree. Note that these files are outputed in a folder named '/OUTPUT' which will be created every time the code is runned. If the code fails, it might be cause you have to deleted the '/OUTPUT' sub-folder from the directory.
To run, use 'python subCategoryFinder <ReviewsToOpen.json.gz> <MetaDataToOpen.json.gz>'.
This is the python code that takes the topic model outputed by language.cpp and applies it to a specific product in a set of reviews. Run it like so: python applyTopicModel.py <Reviews.gz> <model.txt> . It will print the detected topic distribution to the terminal.
This is the modified version of Julian McAuleys' topic modeling code. It outputs a topic model file named savedModel.txt that can be read by 'applyTopicModel.py'.