A Topic Modeling toolbox.
The aim of topik
is to provide a full suite and high-level interface for anyone interested in applying topic modeling.
For that purpose, topik
includes many utilities beyond statistical modeling algorithms and wraps all of its
features into an easy callable function and a command line interface.
Topik
is built on top of existing topic modeling libraries and just provides a wrapper around them, for a quick and
easy exploratory analysis of your text data sets. The motivation of writing topik
was and its starting point was
gensim's tutorials:
conda install -c chdoig topik
from topik.run import run_model
run_model(‘data.json', field='abstract', model='lda_online', r_ldavis=True, output_file=True)
- Output file:
Appends fields tokens
, lda_probabilities
and topic_group
.
- tokens: extracted tokens from the document
- lda_probabilities: list of topic number and assigned lda probability
- topic_group: max lda probability topic
{"tokens": ["deposited", "hfo", "surfaces", "chemical_vapor_deposition_cvd", "geh", "gehx", "deposited", "thermally", "cracking", "geh", "hot",
"tungsten", "filament", "oxidation", "bonding", "studied_ray_photoelectron_spectroscopy_xps", "geh", "geo", "geo", "desorption",
"measured_temperature_programmed_desorption", "tpd", "initially", "reacts", "dielectric", "forming", "oxide_layer", "followed", "deposition",
"formation", "nanocrystals", "cvd", "gehx", "deposited", "cracking", "rapidly", "forms", "contacting", "oxide_layer", "hfo", "stable", "fully", "removed",
"hfo", "surface", "annealing_results", "help", "explain", "stability", "nanocrystals", "contact", "hfo"],
"lda_probabilities": [[2, 0.048728168830183806], [3, 0.081054332141033983], [5, 0.10363835330016971], [7, 0.32014757577039443], [8, 0.35553044832357661], [9, 0.083351716411561097]],
"topic_group": 8}
- Visualization
Outputs LDAvis of your model to your browser.
New BSD. See License File.