Skip to content

jwodder/topic-modeling-toolkit

 
 

Repository files navigation

Topic Modeling Toolkit - Python Library

This library aims to automate Topic Modeling research-related activities.

  • Data preprocessing and dataset computing
  • Model training (with parameter grid-search), evaluating and comparing
  • Graph building
  • Computing KL-divergence between p(c|t) distributions
  • Datasets/models/kl-distances reporting
tests
Travis-CI Build Status
Coverage Status
Code Quality
Code Intelligence
package PyPI Package latest release Supported versions

Overview

This library serves as a higher level API around the BigARTM (artm python interface) library and exposes it conviniently through the command line.

Key features of the Library:

  • Flexible preprocessing pipelines
  • Optimization of classification scheme with an evolutionary algorithm
  • Fast model inference with parallel/multicore execution
  • Persisting of models and experimental results
  • Visualization

Installation

The Topic Modeling Toolkit depends on the BigARTM C++ library. Therefore first you should first build and install it
either by following the instructions here or by using
the 'build_artm.sh' script provided. For example, for python3 you can use the following
$ git clone https://github.com/boromir674/topic-modeling-toolkit.git
$ chmod +x topic-modeling-toolkit/build_artm.sh
$ # build and install BigARTM library in /usr/local and create python3 wheel
$ topic-modeling-toolkit/build_artm.sh
$ ls bigartm/build/python/bigartm*.whl
Now you should have the 'bigartm' executable in PATH and you can find a built python wheel in 'bigartm/build/python/'
You should install the wheel in your environment, for example with command
python -m pip install bigartm/build/python/path-python-wheel
You can install the package with the following command
When the package gets hosted on PyPI, it should be installed
$ cd topic-modeling-toolkit
$ pip install .

If the above fails try again including manual installation of dependencies

$ cd topic-modeling-toolkit
$ pip install -r requirements.txt
$ pip install .

Usage

A sample example is below.

$ current_dir=$(echo $PWD)
$ export COLLECTIONS_DIR=$current_dir/datasets-dir
$ mkdir $COLLECTIONS_DIR

$ transform posts pipeline.cfg my-dataset
$ train my-dataset train.cfg plsa-model --save
$ make-graphs --model-labels "plsa-model" --allmetrics --no-legend
$ xdg-open $COLLECTIONS_DIR/plsa-model/graphs/plsa*prpl*

Citation

  1. Vorontsov, K. and Potapenko, A. (2015). Additive regularization of topic models. Machine Learning, 101(1):303–323.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.4%
  • Shell 1.6%