Skip to content

jjelosua/newslynx-core

 
 

Repository files navigation

newslynx

This is still a WIP and we should be officially open-sourcing the codebase in late June/July 2015. For now, please read the report we published for the TowCenter on our prototype.

(Re)Setting up the dev environment

  • Install newslynx, prefrerably in a virtual environment.
git clone https://github.com/newslynx/newslynx.git
cd newslynx
python setup.py install
  • NOTE: If you're on a mac you should use Postgres APP

  • (re)create a postgresql database

dropdb newslynx 
createdb newslynx
newslynx init
  • populate with sample data
newslynx gen_random_data
  • start the server in debug mode
newslynx runserver -d
  • start a production server via gunicorn
./run
  • IGNORE THIS ERROR:

This is a result of our extensive use of gevent. We haven't yet figured out how to properly suppress this error. See more details here.

Exception KeyError: KeyError(4332017936,) in <module 'threading' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.pyc'> ignored

BUILD PROCESS MAC:

brew reinstall postgresql --build-from-source --with-python

TODO

  • Migrate common utilites from other repos into single repo.

  • Create Database Schema / Models

  • Create Blueprint-based app workflow

  • Re-implement OAuth endpoints

  • Implement Facebook OAuth

  • Re-implement User / Login API

  • Implement Org API

  • Re-implement Settings API

  • Re-implement Events API

    • Implement Postgres-based search
    • Make multiple search vectors
  • Re-implement Things API (aka Articles)

    • Implement Postgres-based search
    • Make multiple search vectors
  • Re-implement Tags API

  • Write out SousChefs JSONSchema

  • Write out initial schemas:

    • article
    • twitter-list
    • twitter-user
    • facebook-page
  • Write out default recipes + tags:

    • article
    • twitter-list
    • twitter-user
    • facebook-page
    • promotion impact tag
  • Update create org endpoint to generate default recipes + tags.

  • Implement SousChefs API

  • Implement Recipes API

  • Implement Thing Creation API

  • Implement SQL Query API

  • Implement Extraction API

  • Implement Event Creation API

  • Create thumbnails for images.

    • Add thumbnail worker redis cache.
  • Implement Metrics API:

    • Create metrics table which contains information on each metric (name, timeseries agg method, summary agg method, cumulative, metric category, level, etc)
    • Faceted metrics only need to declare their name name, not all their potential facet values.
    • Sous Chefs that create metrics must declare which metrics they create.
    • When a recipe is created for a sous chef that creates metrics, these metrics should be created for the associated organization.
    • Timeseries Metrics for things will only be
      collected 30 days after publication. After this period an article moves into an "archived" state.
    • Each Organization should have the following views/apis with these respective functionalites: - [x Timeseries Aggregations - [x] Thing level - [x] By hour + day + month - [ ] Subject Tag Level (subsequent aggregations of things) - [ ] By day. - [x] Impact Tag Level (aggregations of events => non customizable.) - [ ] Org Level (This should include summaries of thing-level statistics, tag-level statistics, and event-level statistics) - [ ] By day, month - [x] optionally return cumulative sums when appropriate - [ ] Summary Stats - [ ] Impact Tag Level - [ ] Subject Tag Level - [ ] Impact Tag Level - [ ] Organization Level - [ ] These should be Archived Every day. and percent changes should be computed over time periods.
  • Implement Reports API (Are these just metrics?)

  • Implement Redis Task Queue For Recipe Running

    • Create gevent worker class to avoid reliance on os.fork
    • Figure out how to rate limit requests.
  • Implement Modular SousChefs Class

  • Figure out how best to use OAuth tokens in SousChefs. Ideally these should not be exposed to users.

  • Implement API client

  • Re-implement SousChefs

    • RSS Feeds => Thing
    • Google Analytics => Metric
    • Google Alerts => Event
    • Social Shares => Metric
    • Homepage Promotions => Metric
    • Twitter Promotions => Metric
    • Facebook Promotions => Metric
    • Twitter List => Event
    • Twitter User => Event
    • Facebook Page => Event
    • Reddit => Event
    • HackerNews => Event
  • Implement New SousChefs

    • IFTTT integrations
      • Wordpress Publish => Thing
      • TK
    • Regex Thing URL => Tag
    • Search Things => Tag
    • Meltwater Emails => Event
    • Newsletter Email Promotions => Metric
    • Calculated Metric? SQL API.
  • Implement Recipe scheduler

  • Implement Admin Panel

  • Migrate Core Prototype Users.

  • Automate Deployment

  • App Integration

  • Document, Document, Document

References

API Design

Crosstab in Postgres

filling in zeros for a timezeries

fetching column names from table

timeseries tips

Getting bigger with flask (+ dynamic subdomains):

Nonblocking with flask, gevent, + psycopg2

Rate Limiting in Flask.

Postgres Search Configuration

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

The API and Data Collection Tasks That Power NewsLynx

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • HTML 3.1%
  • Other 1.4%