WordSeer is a tool for natural language analysis of digital corpora.
There are two parts to this repository.
-
A rewrite of the original implementation of wordseer into python from PHP.
This is located in
app/wordseer/
. It is the server-side and web interface code for the WordSeer application, written in Python using the Flask framework and several web framework libraries. -
An implementation of wordseerbackend in more maintainable python.
This is located in
app/preprocessor/
. It is the pipeline or preprocessing code for uploaded data sets.
The following packages must be installed before performing any setup:
Run install.py
like so:
./install.py -i
This will launch the interactive installer which will guide you through the simple installation process.
If you know what you want, run install.py -h
to view known console flags.
We also recommend installing the python dependencies (discussed below) in a virtual environment.
pip install virtualenv
virtualenv venv
source venv/bin/activate
-
Run:
pip -r install requirements_win.txt
to install the necessary packages.
-
Run:
python database.py create
to create the dabase, and
python database.py migrate
to migrate the model schema into the database.
-
corenlp
must be installed manually. Clone the repository:git clone https://github.com/silverasm/stanford-corenlp-python.git cd stanford-corenlp-python python setup.py install
This should install
corenlp
to your system. -
In order to complete the setup, version 3.2.0 of Stanford's CoreNLP library must simply be in a directory accessible to the backend. Download this file and move it to the root of the repository. Extract it and rename the folder from
stanford-corenlp-full-2013-06-20
tostanford-corenlp
. -
If you followed the above directions, then you shouldn't need to worry about any configuration. If you installed Stanford's CoreNLP elsewhere, then make sure you edit
lib/wordseerbackend/wordseerbackend/config.py
for your setup. Particularly make sure to pointCORE_NLP_DIR
to the Stanford NLP library. -
Run the following command in the console:
python -m nltk.downloader punkt
You should then be ready to parse files. Example XML and JSON files are included in
tests/data
.
Documentation is available on readthedocs. You can also build it yourself:
cd docs/
make html
Or, on windows, simply run make.bat
in the same directory.
Simply run runtests.py
:
python runtests.py