Skip to content

adriancooney/ct422-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exam Paper similarity analyzer

The project is written in Python and requires Python 2.7 to be installed. It uses a PostgreSQL background so PostgreSQL 9.4.2 or higher must be installed.

Dependencies

This project has the following dependencies:

$ pip install flask pyparsing sqlalchemy pycurl slate sklearn nltk pandas numpy scipy psycopg2

For the scraper:

$ pip install beautifulsoup

Configuration

Rename src/config.py.sample to src/config.py and update the variables inside accordingly. PAPER_DIR is a directory to download the exam papers to.

Database

Import the database dump in data/dumps/exam_papers.sql using the psql tool:

$ psql -f data/dumps/exam_papers-29112015.sql

Troubleshooting

  • pycurl requires libcurl to be installed.
    • On debian: sudo apt-get install libcurl4-openssl-dev
  • pandas requires python-dev.l
  • nltk requires running nltk.download() to download it's files. Type d and download the stopwords dataset.
  • slate is broken with the latest version of it's dependency, PDFMiner. Fix it by running sudo pip install --upgrade --ignore-installed slate==0.3 pdfminer==20110515.

Running

Once you have all the dependencies installed and database running, it's simple a matter of starting the server and visiting http://localhost:5000/.

$ pwd
/downloads/ct422-project
$ cd ..
$ python -m project.src.web.api
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Note: To start the server, you must run the python command from the parent directory of the repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages