Ejemplo n.º 1
0
###############################################################################
#
# Short description (interactive annotations only):
#
#
#
# * ``+++ make, world, well`` - words from the intersection of topics = present in both topics;
#
#
#
# * ``--- money, day, still`` - words from the symmetric difference of topics = present in one topic but not the other.
#


mdiff, annotation = lda_fst.diff(lda_fst, distance='jaccard', num_words=50)
plot_difference(mdiff, title="Topic difference (one model) [jaccard distance]", annotation=annotation)

###############################################################################
#
# If you compare a model with itself, you want to see as many red elements as possible (except diagonal). With this picture, you can look at the not very red elements and understand which topics in the model are very similar and why (you can read annotation if you move your pointer to cell).
#
#
#
#
# Jaccard is stable and robust distance function, but this function not enough sensitive for some purposes. Let's try to use Hellinger distance now.
#


mdiff, annotation = lda_fst.diff(lda_fst, distance='hellinger', num_words=50)
plot_difference(mdiff, title="Topic difference (one model)[hellinger distance]", annotation=annotation)