Trajectory

Trajectory is a software platform for automatically extracting topics from university course descriptions. It includes modules for data ingestion, learning, and visualization.

Requirements

The basic requirements are Java JDK 7 or higher, Python 3.0 or higher, virtualenv, and maven. Support for the database layer requires system copies of MySQL, PostGres, SQLite, or similar software. Support for the visualization layer requires a proxy web server (eg. Apache, Nginx).

Setup

Note that this project contains an unholy combination of Bash scripts, Python tools, and Java code. Proceed with setup carefully.

Begin by exporting the $TRJ_HOME path variable.

$ git clone http://github.com/jrouly/trajectory
$ cd trajectory
$ export TRJ_HOME=$(pwd)

Install Python dependencies by calling the bin/util/pysetup script. Java code will be compiled on demand.

To specify or change the database URI and scheme, modify the config.py file. Specifically, look for DATABASE_URI. It defaults to a SQLite file named data.db.

Visualization server

Sample nginx configuration

server {
    listen 80;
    location ^~ /static/  {
        root /TRJ_HOME/src/main/resources/web/static;
    }

    location / {
        proxy_pass         http://localhost;
        proxy_redirect     off;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
    }
}

Use

Download data from a prebuilt target

$ bin/scrape download [-h] {targets}

Export downloaded data to disk

$ bin/scrape export [-h] [--data-directory <directory>]
              [--departments <departments>] [--cs]

This exports data in a format that can be read in by the Learn module. The data directory will default to data/. You can selectively filter subjects exported using the --departments flag.

Run Topic Modeling

$ bin/learn -in <path> -out <path> [-iterations <n>] [-debug]
              [-threads <n>] [-topics <n>] [-words <n>]
              [-alpha <alpha>] [-beta <beta>]

The -in parameter must be an export location from the Scrape module. Results will be stored within a timestamped subdirectory of the -out directory. All other parameters are optional.

Import topic modeling to database

$ bin/scrape import-results [-h] --topic-file <file> --course-file <file>
              [--alpha <alpha>] [--beta <beta>] [--iterations <iterations>]

Read the results of the Learn module (inferred topics) back into the database and pair with existing course data. Multiple imports will simply add ResultSets to the existing database.

Run visualizations server

$ bin/web

Activate the visualization server. See gunicorn.py for configuration settings. Notice that the PID and log files are stored in the TRJ_HOME.

ToDo

Create common engine frameworks for catalog installs to be more DRY.
Refactor configuration objects as a module.

Name		Name	Last commit message	Last commit date
Latest commit History 640 Commits
bin		bin
docs		docs
lib		lib
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gunicorn.py		gunicorn.py
pom.xml		pom.xml
requirements.txt		requirements.txt
run-tests		run-tests

License

jrouly/trajectory

Folders and files

Latest commit

History

Repository files navigation

Trajectory

Requirements

Setup

Visualization server

Sample nginx configuration

Use

Download data from a prebuilt target

Export downloaded data to disk

Run Topic Modeling

Import topic modeling to database

Run visualizations server

ToDo

About

Resources

License

Stars

Watchers

Forks

Languages