Goal

Develop a compact, composable, extendable API for creating and evaluating a workflow graph.

(Possibly) relevant work

cloudmesh_task
celery
futures: backport of Python 3’s concurrent.futures module to Python 2
Work Queue
lazypy: a python promises framework Provides:
- lazy / delay
- spawn / future
- fork / forked
NetworkX: library for working with graphs.
traits: by Enthought Includes a simple way of defining a function to call when a class property changes
astor: AST observe/rewrite
lib2to3: round tripping src -> AST -> src
rope: python refactoring library
unparse.py: “unparseing” AST
RedBaron: self-modifying python code

Preamble

Initialize directory structure.

mkdir -p {code,images}/workflow

Design

Given a set of python functions

def A(): return 40

def B(): return 2

def C(): return A() + B()

we want to

(A | B) ; C

which means

Evaluate A and B in parallel, wait until both complete, then evaluate C

Approaches

Fundamentally there are two phases to this problem:

creation of the call graph
evaluation of the call graph

The naive solution

The call-graph is essentially the Abstract Syntax Tree of the provided program. In the case of the example above, such a tree could be:

digraph {
        C -> A
        C -> B
}

One approach to buid this in python could be the following:

@task
def A(): return 40

@task
def B(): return 2

@task
def C(): pass
# inspect call graph (an implicit global value) to call and retrieve
# the values of =A()= and =B()=.

evaluate((A || B) && C)

A few problems will arise with this approach:

Expicitly creating the dependency graph will be error prone for anything more than a simple workflow
How will function parameters, which may also be the tips of a dependency tree, be incorporated?

Implicitly building the graph

Rather than explictly building the call-graph, build the graph implictly. Something like this would be ideal as boundary is inferred from the calls to A and B withing C.:

def A(): return 40

def B(): return 2

def C():
    a = A()
    b = B()
    return a + b

print C()

Since this is plain and simple Python code, the question then becomes:

How to build a call-graph of a simple Python expression?

import sys
import subprocess
import networkx as nx
import ast
from textwrap import dedent
from pprint import pprint

code = dedent("""\
from functools import wraps
class task(object):
    def __call__(self, fn):
        @wraps(fn)
        def wrapper(*args, **kws):
            return fn(*args, **kws)
        return wrapper

@task()
def A(): return 40

@task()
def B(): return 2

def C(): print 'Not a task'

@task()
def D():
    a = A()
    b = B()
    C()
    return a + b

print D()
""")


class Visitor(ast.NodeVisitor):
    def __init__(self, amount=2):
        self._indent = 0
        self._amount = amount
        self.G = nx.DiGraph()
        self.task_functions = ['START']
        self.G.add_nodes_from(self.task_functions)
    
    def _inc(self):
        self._indent += self._amount
    
    def _dec(self):
        self._indent -= self._amount
    
    def _print(self, node, extra=''):
        # extra = extra or '(' + ','.join([k for k, _ in ast.iter_fields(node)]) + ')'
        # print '|' + self._indent * '--', node.__class__.__name__ + extra
        pass
    
    def _recurse(self, node):
        self._inc()
        for child in ast.iter_child_nodes(node):
            self.visit(child)
        self._dec()
    
    def nest(self, name):
        # print 'PUSH', self.task_functions, name
        self.task_functions.append(name)
    
    def pop(self):
        # print 'POP', self.task_functions
        if len(self.task_functions) > 1:
            self.task_functions.pop()
    
    def _is_task(self, node):
        assert isinstance(node, ast.FunctionDef), type(node)
        for dec in node.decorator_list:
            return dec.func.id == 'task'
    
    def visit_FunctionDef(self, node):
        if self._is_task(node):
            self._print(node, extra='(name=%s)' % node.name)
            self.G.add_node(node.name)
            self.nest(node.name)
        self._recurse(node)
        self.pop()
    
    def visit_Call(self, node):
        child = node.func.id
        self._print(node, extra='(f=%s)' % child)
        parent = self.task_functions[-1]
        assert parent in self.G, (parent, self.G.nodes())
    
        if child in self.G.nodes():
            self.G.add_edge(parent, child)
            self.nest(child)
        self._recurse(node)
    
        if child in self.G.nodes():
            self.pop()
    
    def generic_visit(self, node):
        # self._print(node)
        self._recurse(node)

tree = ast.parse(code)
v = Visitor()
v.visit(tree)

dotfile = 'code/workflow/prototype_callgraph.dot'
nx.write_dot(v.G, dotfile)
svg = subprocess.check_output(['dot', '-Tsvg', dotfile])
with open('images/workflow/prototype_callgraph.svg', 'w') as fd:
    fd.write(svg)

Dynamic vs Strict call-graph

Operator Overloading

Consider the simplified problem of processing the expression:

(A | B) ; C

Using similar syntax to build the DAG can be done in Python by overloading the bitwise AND and OR operators:

class Node(object):

def __and__(self, other):
    return self.compose(other, AndNode)

def __or__(self, other):
    return self.compose(other, OrNode)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images/workflow		images/workflow
workflow		workflow
.dir-locals.el		.dir-locals.el
README.org		README.org
README.rst		README.rst
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images/workflow

images/workflow

workflow

workflow

.dir-locals.el

.dir-locals.el

README.org

README.org

README.rst

README.rst

requirements.txt

requirements.txt

Repository files navigation

Goal

(Possibly) relevant work

Preamble

Design

Approaches

The naive solution

Implicitly building the graph

Dynamic vs Strict call-graph

Operator Overloading

About

Releases

Packages

Contributors 2

Languages

futuresystems/python-workflow

Folders and files

Latest commit

History

Repository files navigation

Goal

(Possibly) relevant work

Preamble

Design

Approaches

The naive solution

Implicitly building the graph

Dynamic vs Strict call-graph

Operator Overloading

About

Resources

Stars

Watchers

Forks

Languages