LDA with Collapse Gibbs Sampling and Stochastic Variational Inference
- Language: Python3
- Prerequisite libraries: Scipy, Numpy, Jupyter Notebook, Cython
- Fetch git repo:
git clone https://github.com/haofuml/sta663_project_lda.git
cd sta663_project_lda
- Install packages:
pip install --index-url https://test.pypi.org/simple/ sta663_project_lda
- generate toy dataset:
python -m sta663_project_lda.preprocessing.gen_toydata
- prepare NYT dataset:
python -m sta663_project_lda.preprocessing.gen_nytdata
- Toy dataset results:
python -m sta663_project_lda.algorithms.lda_gibbs
python -m sta663_project_lda.algorithms.lda_svi
alternatively:
Exceute lda_test.ipynb in jupyter notebook
- Computational efficiency comparison:
Exceute lda_time.ipynb in jupyter notebook
- New York Times dataset results:
Exceute lda_nytime.ipynb in jupyter notebook
These are the top ten words in each topic on New York Times dataset.