Please fork this repository and paste the github link of your fork on Microsoft CMT. Detailed instructions are on Coursera under Week 1: Course Project Overview/Week 9 Activities.
In order to run the PLSA algorithm, the NYT corpus structure has to be '../CourseProject/data//'
We save the stock prices from May 2000 to Oct 2000 as htm files in the format of '_2000.htm' in a folder called 'stock_data'. The htm files can be saved from https://iemweb.biz.uiowa.edu/pricehistory/pricehistory_SelectContract.cfm?market_ID=29
-
plsa_without_prior.py
: initial run of PLSA without any priors -
plsa_with_prior.py
: subsequent PLSA runs with priors determined from Granger and Pearson tests -
word_retriever.py
: retrieves various info required such as word frequency per day -
analysis.py
: contains code for running the Granger and Pearson coefficient tests -
main.py
:
- Retrieve and normalize stock data
- Initially run PLSA without prior, and run the analysis (Granger and Pearson coefficient tests) to retrieve priors
- Using the priors retrieved above, iterate with the PLSA with prior until the desired convergence is achieved.
Once the data files are saved in the structure defined above, you should be able to run python3 main.py
which will converge
after a desired convergence has been retrieved.