Skip to content

m1key/text-analysis

Repository files navigation

Text Analysis

Sandbox Python project for some text stats. Will be applied to my gallery yaml files.

Setup

Using Python 2. Only mining.py requires Anaconda.

Actually, maybe use easy_install? No sudo required.

pip install textstat

pip install nltk

>>> import nltk

>>> nltk.download('punkt')

>>> nltk.download('english')

pip install textblob

pip install pyaml

Setup for Anaconda

If Anaconda is installed, the following must run.

conda install -c conda-forge textblob

conda install -c mlgill textstat

Simple run

This will analyse a local file.

python analyse.py

Analyse All Galleries

This will analyse all galleries, assuming they are under the default directory. It will write statistics for each gallery.

python analyse_all.py

Make it verbose to see problems:

python analyse_all.py -v

You can store the output as CSV:

python analyse_all.py -o /tmp/galleries.csv

Use --help for all options.

python analyse_all.py --help

Mining

This will do some Pandas data mining on the file generated by the previous script.

Analyse All As One

This will analyse all galleries as one text corpus.

python analyse_all.py

Difficulty Analyser

Unlike other algorithms here, this will make an API call to analyse the difficulty of each word separately.

Tests

This will run basic methods against a static text.

python tests.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages