Skip to content

CanaanShen/topicModels

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  1. Mallet Extension

In Mallet package, it only contains two topic Models--LDA and Hierachical LDA. So I tried to implement some useful topic modeling methods on it.

Model:

  • Hierarchical Dirichlet Process with Gibbs Sampling. (in HDP folder)
  • Inference part for hLDA. (in hLDA folder)

Usage:

  1. This is an extension for Mallet, so you need to have Mallet's source code first.
  2. put HDP.java,HDPInferencer.java and HierarchicalLDAInferencer.java in src/cc/mallet/topics folder.
  3. If you are going to run HDP, make sure you include knowceans package in your project.
  4. run HDPTest.java or hLDATest.java will give you a demo for a small dataset in data folder.

References:

  1. Scikit-learn Extension

Scikit-learn doesn't have any topic models yet, so I modified Matthew D. Hoffman's onlineldavb into scikit-learn format.

Model:

  • online LDA with variational EM. (In LDA folder)

Usage:

  1. Make sure numpy, scipy, and scikit-learn are installed.
  2. run python test in lda folder for unit test
  3. The onlineLDA model is in lda.py.
  4. For a quick exmaple, run python lda_example.py online will fit a 10 topics model with 20 NewsGroup dataset. online means we use online update(or partial_fit method). Change online to batch will fit the model with batch update(or fit method).

Note:
I updated the currecnt code based on currrent scikit-learn development branch. Older scikit-learn version will thorw error, but I am pretty sure everyone can fixed this with little alteration. (2014/09/15)

Reference:

About

topics Models extension for Mallet & scikit-learn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 60.6%
  • Python 39.4%