Sandbox Python project for some text stats. Will be applied to my gallery yaml files.
Using Python 2. Only mining.py
requires Anaconda.
Actually, maybe use easy_install? No sudo required.
pip install textstat
pip install nltk
>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('english')
pip install textblob
pip install pyaml
If Anaconda is installed, the following must run.
conda install -c conda-forge textblob
conda install -c mlgill textstat
This will analyse a local file.
python analyse.py
This will analyse all galleries, assuming they are under the default directory. It will write statistics for each gallery.
python analyse_all.py
Make it verbose to see problems:
python analyse_all.py -v
You can store the output as CSV:
python analyse_all.py -o /tmp/galleries.csv
Use --help
for all options.
python analyse_all.py --help
This will do some Pandas data mining on the file generated by the previous script.
This will analyse all galleries as one text corpus.
python analyse_all.py
Unlike other algorithms here, this will make an API call to analyse the difficulty of each word separately.
This will run basic methods against a static text.
python tests.py