Skip to content

lnielsen/yamz

 
 

Repository files navigation

This is the README for the YAMZ metadictionary and includes instructions for 
deploying on a local machine for testing and on Heroku (heroku.com) for a 
scalable production version. These assume a Ubuntu GNU/Linux environment, but 
should be easily adaptable to any system; YAMZ is written in Python and uses 
only cross-platform packages. 
  
  Authored by Chris Patton. Last updated 28 Sep 2014. 

YAMZ is formerly known as SeaIce; for this reason, the database tables 
and API are named for SeaIce. 



Contents 
========
 
 0. Prerequites 
     0.1 Repository structure
 
 1. Configuring a local instance
     1.1 Postgres authentication
     1.2 Create the database
     1.3 Create a role for standard queries
     1.4 Oauth credentials and app key
     1.5 N2T persistent identifier credentials
     1.6 Test the instance
 
 2. Deploying to Heroku
     2.1 Heroku-Postgres
     2.2 Mailgun
     2.3 Heroku-Scheduler
     2.4 Making changes
     2.5 Exporting the dictionary

 3. URL forwarding

 4. Building the docs 


0. Prerequisites 
================

The contents of this directory are as follows: 

  sea.py . . . . . . . . . . Consolue utility for scoring and classifying
                             terms and other things. 

  ice.py . . . . . . . . . . Web server front end.

  digest.py  . . . . . . . . Console email notification utility.

  requirements.txt . . . . . Heroku package dependencies.

  Procfile . . . . . . . . . Heroku configuration.

  seaice/  . . . . . . . . . The SeaIce Python module. 

  html/  . . . . . . . . . . HTML templates, static Javascript and CSS, 
                             including bootstrap.js. 
 
  doc/ . . . . . . . . . . . API documentation and tools for building it. 

  .seaice/.seaic_auth  . . . DB credentials, API keys, app key, etc. Note 
                             that these files are just templates and don't
                             contain actual keys. 

Before you get started, you need to set up a database and some software 
packages. On Ubuntu, grab the follwoing:
  
  python-flask . . . . . . . Simple HTTP server.

  postgresql . . . . . . . . We're using PostgreSQL for databse managment. 

  python-psycopg2  . . . . . Python API for PostgreSQL.

  python-pip . . . . . . . . Package manager for additional Python 
                             programs. 

We need to download a package from pip that handles configuration files
nicely. Do: 

  $ sudo pip install configparser flask-login flask-Oauth 
    python-dateutil urlparse


0.1 Repository structure
========================

The 'master' branch contains all the code to deploy locally or on heroku. 
This directory. To deploy, create a local branch called "deploy_keys" and 
edit .seaice .seaice_auth with actual API and app keys. Then push to 
heroku with `git push heroku deploY_keys:master`. See section 2 for more 
on heroku. NEVER PUSH THIS BRANCH TO GITHUB. 



1. Configuring a local instance 
===============================

To start, we'll set up a database in postgres. First, we need to do some 
configuration. Postgres requires an administrative user called 'postgres'. 
It may be a good idea to create a SeaIce user (called "role" in postgres 
jargin) with read/ write access granted on the tables. First, set postgres' 
password: 

  $ sudo -u postgres psql template1
  template1=# alter uesr postgres with encrypted password 'PASS';  
  template1=# \q [quit] 


1.1 Posgres authentication
==========================

Now configure the authentication method for postgres and all other users 
connecting locally. In /etc/postgresql/9.1/main/pg_hba.conf, change "peer" 
in line

 local   all         postgres                          peer

to "md5" for the administrative account and local unix domain socket 
connections. Next, we want to only be able to connect to the database from 
the local machine. In /etc/postgresql/9.1/main/postgresql.conf, uncomment the 
line

 listen_addresses = 'localhost'

After you've done this, you need to restart the postgres server:

  $ sudo service postgresql restart


1.2 Create the database
=======================

Finally, log back into postgres to create the database:

  $ sudo -u postgres psql
  postgres=# create database seaice with owner postgres;
  
(Using unique, completely random passwords is a good idea here.) Next, 
create a configuration file for the database and user account you set up. 
Create a file called `.seaice` like: 

  [default]
  dbname = seaice
  user = postgres
  password = PASS

IMPORTANT NOTE: A template of this file is provided in the github
repository. This file should remain secret and must not be published. 
Set the correct file permissions with: 

  $ chmod 600 `.seaice`

This file is used by the SeaIce DB connector to grant access to the database.
To initialize the DB schema and tables, type:
  
  $ ./sea.py --init-db --config=.seaice


1.3 Create a role for standard queries
======================================

At this point, it's suggested that you set up a user standard read/write 
permssions on the table (no DROP, CREATE, GRANT, etc.) for most of the 
database queries. Note that this isn't applicable in Heroku; the postgres
interface there doesn't allow you to control user views. 
  
  postgres=# create user contributor with encrypted password 'PASS';
  postgres=# \c seaice;
  postgres=# grant select, insert, update, delete on all tables in 
             schema SI, SI_Notify to contributor;

Add the configuuration to `.seaice`: 

  [contributor]
  dbname = seaice
  user = contributor
  password = PASS

The web user interface creates a database connection pool with the 
same role. You can specify this on the command line: 

  $ ./ice.py --role=contributor --config=.seaice

'--role' defaults to 'default'. 


1.4 Oauth credentials and app key
=================================

YAMZ uses Google for third party authentication (OAuth-2.0) management of 
logins. Visit https://console.developers.google.com to set this service up 
for your instance. Navigate to "APIs & auth" -> "Credentials" and click 
"Create new client ID". For local configuration: 
 
 Application type . . . . . . . . . . . . Web application
 Authorized javascript origins  . . . . . http://localhost:5000 
 Authorized redirect URI  . . . . . . . . http://localhost:5000/authorized 

You'll create a new client ID for your heroku instance. (See section 2.) 
Next, create a configuration file called `.seaice_auth` with the appropriate 
client ID's and secret keys. For instance, you may have credentials for 
'http://localhost:5000', as well as a dpeloyment on heroku: 

  [dev]
  google_client_id = 000-fella.apps.googleusercontent.com
  google_client_secret = SECRET1
  app_secret = SECRET2

  [heroku]
  google_client_id = 000-guy.apps.googleusercontent.com
  google_client_secret = SECRET3
  app_secret = SECRET4

IMPORTANT NOTE: A template of this file is provided in the github
repository. This file should remain secret and must not be published. 
We provide the template, since heroku requires a commited file. 

For conveniance, this file will also keep the Flask app's secret key. For 
this key, enter a long, random string of characters. Finally, set the correct 
file permissions with: 

  $ chmod 600 `.seaice_auth`


1.5 N2T persistent identifier credentials
========================================= 

A third-party URI resolver (maintianed by John Kunze et al.) is now an 
integrated part of YAMZ. It is therefore necessary to provide a minter 
password for API access to this web service. Include a line in .seaice_auth 
for every view:
 
 minter_password = PASS


1.6 Test the instance
=====================

First, create the database schema: 

  $ ./sea --config=.seaice --init-db

Start the local server with: 
 
  $ ./ice.py --config=.seaice --deploy=dev

If all goes well, you should be able to navigate to your server by typing 
'http://localhost:5000' in the address bar. To verify that you've set up 
Google Oauth-2.0 correctly, try logging in. This will create an account.
Try adding a new term, modifying and deleting a term, and commenting on 
termss. To classify a term, do: 

  $ ./sea.py --config=.seaice --classify-terms



2. Deploying to Heroku
======================

The YAMZ prototype is currently hosted at http://yamz.herokuapp.com. 
Heroku is a cloud-computing service which allows users to host web-based
software projects. Heroku is scalable for a price; however, we can 
still achieve quite a bit without spending money. We have access to a 
small Postgres database, can shedule jobs, use a variety of packages 
(all we need are available), and deploy easily with Git. It is however
impossible to set up DB roles. Also, we can't assume persistent 
access to the filesystem. 

To begin, you need to setup an account with Heroku and download their software. 
(It's nothing major, just some tools for running commands, interacting with 
the database, etc.) Visit http://www.heroku.com. 

Heroku requires a couple additional configuration files and some small
code changes. The additional files are:

  Procfile . . . . . . . specifies the commands that start web server, as 
                         well as periodic jobs. 

  requirements.txt . . . a list of packages required by our software that 
                         Heroku needs to make available. These are 
                         available via pip.

I used the following tutorial: https://devcenter.heroku.com/articles/python
to set these up. Note that these steps have already been done; once you've
set up your heroku account, you're ready to deploy. 

The recommended best practice for managing your heroku instance is to set up
a local branch called 'deploy_keys' based on 'master'. In this branch, edit 
.seaice_auth to contain actual API and app keys. NOTE: IT IS CRITICAL THAT 
YOU DON'T PUSH THIS BRANCH TO GITHUB. Publishing these secrets comprimises 
the security of the entire app.

Login via the heroku website and create a new app. (Suppose we've named it
"fella".) Navigate to the directory containing the cloned repository. Create
and checkout the branch 'deploy_keys'. 

  $ heroku login
  $ heroku git:remote -a fella
  $ git push heroku deploy_keys:master

This creates a "slug" containing our code and its dependencies. To get the web 
app running, we'll now need to set up a database and a couple heroku backend 
services. 


2.1 Heroku-Postgres
===================

Heroku-Postgres is a scalable DB interface for heroku apps. Follow the 
following tutorial to set this up: 
https://devcenter.heroku.com/articles/heroku-postgresql#connection-in-python

The 'master' branch is set up to use either a local postgres database server
or Heroku-Postgres. The location of the DB in the "cloud" is specified by the 
environment variable "DATABASE_URL". Using 'sea' or 'ice' with '--config=heroku'
will force SeaIce to use this variable to connect to the DB. (Note this is the 
default.) Heroku-Postgres doesn't allow you to create roles, so '--role' will 
be ignored and the default will be used. To create the database schema: 

  $ heroku run python sea.py --init-db


2.2 Mailgun
===========

YAMZ provides an email notification service for users who opt in. A utility 
called 'digest' collects for each user all notifications that haven't previously 
been emailed into a single digest. The code uses a heroku backend app called 
Mailgun for SMTP service. To set this up, simply type 

  $ heroku addons:add mailgun

The code uses environment variables "MAILGUN_SMTP_LOGIN" and 
"MAILGUN_SMTP_PASSWORD" to connect to Mailgun. To send out notifications, 
type: 

  $ heroku run python digest.py 


2.3 Heroku-Scheduler
====================

There are two periodic jobs that need to be scheduled in YAMZ: the term 
classifier and the email digest. To set this up, do: 
  
  $ heroku addons:add scheduler
  $ heroku addons:open scheduler

The second command will take you to the web interface for the scheduler. Add
the following two jobs: 

  "python sea.py --classify-terms" . . . . . every 10 minutes
  "python digest.py" . . . . . . . . . . . . once per day


2.4 Making changes
==================

Deploying changes to heroku is made very easy with Git. Suppose we have changes
to 'master' that we want to push to heroku. First checkout the already created
local 'deploy_keys' branch:
  
 $ git checkout deploy_keys
 $ git merge master
 $ git push heroku deploy_keys:master


2.5 Exporting the dictionary
============================

The SeaIce API includes queries for importing and exporting database tables 
in JSON formatted objects. This could be used to backup the entire database.
Note however that imports must be done in the proper order in order to satisfy
foreign key constratins. To back up the dictionary, do: 

  $ heroku config | grep DATABASE_URL
  DATABASE_URL: <whatever>
  $ export DATABASE_URL=<whatever> 
  $ ./sea.py --config=heroku --export=Terms >terms.json



3. URL forwarding
=================

The current stable implementation of YAMZ is redirected from http://yamz.net. 
Setting this up takes a bit of doing. The following instructions are synthsized
from http://lifesforlearning.com/heroku-with-godaddy/ for redirecting a domain
name managed by GoDaddy to a Heroku app.

Launch the "Domains" app on GoDaddy. Under "Forward Domain" for the appropriate
domain (let's call it "fella.org"), add the following settings:
 
 Forward to . . . . . . . . . . . . . . . . . . . . http://www.fella.org
 Redirect type  . . . . . . . . . . . . . . . . . . 301 (Permanent)
 Forward settings . . . . . . . . . . . . . . . . . Forward only 
 Update nameservers and DNS settings 
           to support this change . . . . . . . . . yes

Next, under "Manage DNS", remove all entries except for 'A (Host)' and 'NS 
(Nameserver)', and add the following under 'CName (Alias)': 

 Record type  . . . . . . . . . . . . . . . . . CNAME (Alias)
 Host . . . . . . . . . . . . . . . . . . . . . www
 Points to  . . . . . . . . . . . . . . . . . . http://fella.herokuapp.com
 TTL  . . . . . . . . . . . . . . . . . . . . . 1 Hour

Next, change the IP address for entry '@' under 'A (Host)' to 50.63.202.31. 

That's it for DNS configuration. The last thing we need to do is modify the 
redirect URLS in the Google Oauth API. Edit the authorized javascript origins 
and redirect URI by replacing "fella.herokuapp.com" with "fella.org" and 
save. 

It can take a couple hours to a day for your DNS settings to propogate. Once 
it's done, you can navigate to YAMZ by typing "fella.org" into your browser.
Try logging in to verify that the Oauth settings are also correct. 



4. Building the docs 
====================

The seaice package is autodoc'ed using python-sphinx. To install on Ubuntu:

  $ sudo apt-get install python-sphinx

The directory doc/sphinx includes a Makefile for exporting the docs to 
various media. For example, 

  make html
  make latex

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 71.3%
  • Python 16.3%
  • JavaScript 10.6%
  • CSS 1.8%