Skip to content

divyakalra/pipe-o-matic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Pipe-o-matic Pipeline Framework

This repository is a work in progress for a system for authoring and running data pipelines. For now the core functionality is only partly implemented. Move along...

A pipeline is more than just a script. It carefully tracks which steps have been executed and provides facilities for recovering from errors.

Example pipeline: my-pipeline-1.yaml

- file_type: explicit-sequence-1
- executable-versions:
  foo: "1.0"
  bar: "1.0"
- command: mkdir
  dir: sub_dir
- executable: foo
  stdin: input_file  # may be parameterized
  stdout: sub_dir/intermediate_file
- command: md5
  stdin: sub_dir/intermediate_file
  stdout: checksum.md5
- executable: bar
  arguments:
    - sub_dir  # may be parameterized
  stderr: bar.log

Assuming that things are configured correctly, you could run that pipeline like this:

pmatic my-pipeline-1 $BASE_DIR run

In order for that to work, you would need some way of locating those executables. You would do so with a deployments file:

deployments.yaml

file_type: deployments-1
foo:
    "1.0": /usr/local/foo-1.0/bin/foo  # path to the executable
bar:
    "1.0": /usr/local/bar-1.0/bin/bar

About

a framework for combining third party executables to construct data pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.6%
  • Shell 27.4%