CourseProject

Please fork this repository and paste the github link of your fork on Microsoft CMT. Detailed instructions are on Coursera under Week 1: Course Project Overview/Week 9 Activities.

NYT Corpus folder structure

In order to run the PLSA algorithm, the NYT corpus structure has to be '../CourseProject/data//'

Stock data folder structure

We save the stock prices from May 2000 to Oct 2000 as htm files in the format of '_2000.htm' in a folder called 'stock_data'. The htm files can be saved from https://iemweb.biz.uiowa.edu/pricehistory/pricehistory_SelectContract.cfm?market_ID=29

Code content

plsa_without_prior.py: initial run of PLSA without any priors
plsa_with_prior.py: subsequent PLSA runs with priors determined from Granger and Pearson tests
word_retriever.py: retrieves various info required such as word frequency per day
analysis.py: contains code for running the Granger and Pearson coefficient tests
main.py:

Retrieve and normalize stock data
Initially run PLSA without prior, and run the analysis (Granger and Pearson coefficient tests) to retrieve priors
Using the priors retrieved above, iterate with the PLSA with prior until the desired convergence is achieved.

How to run code

Once the data files are saved in the structure defined above, you should be able to run python3 main.py which will converge after a desired convergence has been retrieved.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
prior_plsa		prior_plsa
stock_data		stock_data
.gitignore		.gitignore
CS 410 Project Proposal.pdf		CS 410 Project Proposal.pdf
Final Presentation Slides.pdf		Final Presentation Slides.pdf
Final Presentation.mp4		Final Presentation.mp4
Final documentation.pdf		Final documentation.pdf
Progress Report.pdf		Progress Report.pdf
README.md		README.md
analysis.ipynb		analysis.ipynb
analysis.py		analysis.py
main.py		main.py
plsa_new.csv		plsa_new.csv
plsa_with_prior.py		plsa_with_prior.py
plsa_with_prior_100docs.csv		plsa_with_prior_100docs.csv
plsa_with_prior_10docs.csv		plsa_with_prior_10docs.csv
plsa_with_prior_alldocs.csv		plsa_with_prior_alldocs.csv
plsa_without_prior.py		plsa_without_prior.py
plsa_without_prior_10docs.csv		plsa_without_prior_10docs.csv
word_retriever.py		word_retriever.py

97agupta/CourseProject

Folders and files

Latest commit

History

Repository files navigation

CourseProject

NYT Corpus folder structure

Stock data folder structure

Code content

How to run code

About

Resources

Stars

Watchers

Forks

Languages