Skip to content

kcompher/topik

 
 

Repository files navigation

Build Status Coverage Status

Topik

A Topic Modeling toolbox.

Introduction

The aim of topik is to provide a full suite and high-level interface for anyone interested in applying topic modeling. For that purpose, topik includes many utilities beyond statistical modeling algorithms and wraps all of its features into an easy callable function and a command line interface.

Topik is built on top of existing topic modeling libraries and just provides a wrapper around them, for a quick and easy exploratory analysis of your text data sets. The motivation of writing topik was and its starting point was gensim's tutorials:

Installation

conda install -c chdoig topik

Use

from topik.run import run_model

run_model(‘data.json', field='abstract', model='lda_online', r_ldavis=True, output_file=True)

Results

  • Output file:

Appends fields tokens, lda_probabilities and topic_group.

  • tokens: extracted tokens from the document
  • lda_probabilities: list of topic number and assigned lda probability
  • topic_group: max lda probability topic
{"tokens": ["deposited", "hfo", "surfaces", "chemical_vapor_deposition_cvd", "geh", "gehx", "deposited", "thermally", "cracking", "geh", "hot",
"tungsten", "filament", "oxidation", "bonding", "studied_ray_photoelectron_spectroscopy_xps", "geh", "geo", "geo", "desorption",
"measured_temperature_programmed_desorption", "tpd", "initially", "reacts", "dielectric", "forming", "oxide_layer", "followed", "deposition",
"formation", "nanocrystals", "cvd", "gehx", "deposited", "cracking", "rapidly", "forms", "contacting", "oxide_layer", "hfo", "stable", "fully", "removed",
"hfo", "surface", "annealing_results", "help", "explain", "stability", "nanocrystals", "contact", "hfo"],
"lda_probabilities": [[2, 0.048728168830183806], [3, 0.081054332141033983], [5, 0.10363835330016971], [7, 0.32014757577039443], [8, 0.35553044832357661], [9, 0.083351716411561097]],
"topic_group": 8}
  • Visualization

Outputs LDAvis of your model to your browser.

LICENSE

New BSD. See License File.

About

A Topic Modeling toolbox

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 91.7%
  • Shell 5.0%
  • Batchfile 2.0%
  • R 1.3%