The project is written in Python and requires Python 2.7 to be installed. It uses a PostgreSQL background so PostgreSQL 9.4.2 or higher must be installed.
This project has the following dependencies:
$ pip install flask pyparsing sqlalchemy pycurl slate sklearn nltk pandas numpy scipy psycopg2
For the scraper:
$ pip install beautifulsoup
Rename src/config.py.sample
to src/config.py
and update the variables inside accordingly. PAPER_DIR
is a directory to download the exam papers to.
Import the database dump in data/dumps/exam_papers.sql
using the psql
tool:
$ psql -f data/dumps/exam_papers-29112015.sql
pycurl
requireslibcurl
to be installed.- On debian:
sudo apt-get install libcurl4-openssl-dev
- On debian:
pandas
requirespython-dev
.lnltk
requires runningnltk.download()
to download it's files. Typed
and download thestopwords
dataset.slate
is broken with the latest version of it's dependency,PDFMiner
. Fix it by runningsudo pip install --upgrade --ignore-installed slate==0.3 pdfminer==20110515
.
Once you have all the dependencies installed and database running, it's simple a matter of starting the server and visiting http://localhost:5000/.
$ pwd
/downloads/ct422-project
$ cd ..
$ python -m project.src.web.api
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Note: To start the server, you must run the python
command from the parent directory of the repository.