Skip to content

r-mbruckner/stream_engine

 
 

Repository files navigation

Stream Engine

About

Stream Engine is a RESTful HTTP application for running derived products. It was written for OOI (Ocean Observatories Initiative) by Raytheon in Python.

Running It

Running the stream engine is broken down into several steps, each of which is discussed in detail below.

1. Installation
2. Configuration
3. Running

Installation

Stream Engine is using the cassandra-driver with the libev C extension for better performance. Before installing the cassandra-driver dependency (shown below), install libev and libev-devel with yum. Some other packages are also required.

    sudo yum install libev libev-devel lapack blas

You need to make sure your ssl is upgraded to the latest in the repo.

    sudo yum upgrade openssl

You are going to need your python environment setup correctly for this. Add the following line to your ~/.bash_profile or ~/.bashrc file or execute it in a terminal window before running commands.

    export STREAM_ENGINE_PYTHON=<path_to_python27>

Be sure you use the specific python27 that comes prepackaged with the python modules and ion-funcitons library required to run the stream engine. An alternative would be to modify the path in the manage-streamng directory.

Clone the stream engine repository. Git will need to be yum installed if it is not present on your system.

    git clone https://github.com/oceanobservatories/stream_engine.git

Configuration

Change into the stream_engine directory. You may need to make changes to the following files:

config.py - The variable most likely to be changed is CASSANDRA_CONTACT_POINTS. This should be a comma seperated list of IP addresses for the cassandra cluster.

gunicorn_config.py - The varible most likely to be changed is the workers. Each worker can consume 2-4 GB of RAM, so the default of 8 workers presumes the presence of a very beefy box. WARNING: Setting this value too high could result in a nonresponsive system.

Running It

There is a script that allows for multi-threaded instances of stream engine. As stated above, the number of threads is controlled by the number of workers (i.e. a value of 1 would indicate single threading).

    ./manage-streamng start
    ./manage-streamng stop

Logs are written in the present working directory with the names stream_engine.access.log and stream_engine.error.log.

Gotchas

The default limit to the number of particles returned in a query is 50. The limit can be adjusted by the limit=n parameter. A number less than zero indicates that there is no limit (all data should be returned). That may be dangerous, so use with caution.

For optimization reasons, a query with a limit of n will not always return exactly n results, even if there are n results in the data store. The reason is that the query is sliced into individual buckets to reduce the demand on the server. The number of results returned should be reasonable close to the number specified in the limit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.7%
  • Other 1.3%