Skip to content

cswangyuhui/Evaluating-Document-Transformations-for-Clustering-Text

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Document Transformations for Clustering Text

This is the source code to go along with the blog article

Clustering Text with Transformed Document Vectors

Dependencies

numpy
elasticsearch
nltk
gensim
scikit-learn
wordcloud
image
matplotlib
pyyaml

Usage

1. Word Clouds

cd wordclouds

python ./plotWords.py twenty-news

to generate imges like:

rec.sport.hockey articles

(or)

python ./plotWords.py acl-imdb 

to generate imges like:

Negative Reviews

2. Intra & Inter cluster distance transformations

cd analysis
python ./analyze.py twenty-news
python ./processAnalysis.py twenty-news

python ./analyze.py acl-imdb
python ./processAnalysis.py acl-imdb

to generate the box-whisker plot:

Transformation of distances

and for the intercluster/intracluster ratio:

B/A Ratio

3. Clustering

cd clusters
mkdir logs
./run.sh twenty-news
./run.sh acl-imdb

Processing the results yields images like: 20-news Results Movie Reviews Results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Shell 2.9%