Skip to content

futuresystems/python-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Goal

Develop a compact, composable, extendable API for creating and evaluating a workflow graph.

(Possibly) relevant work

  • cloudmesh_task
  • celery
  • futures: backport of Python 3’s concurrent.futures module to Python 2
  • Work Queue
  • lazypy: a python promises framework Provides:
    • lazy / delay
    • spawn / future
    • fork / forked
  • NetworkX: library for working with graphs.
  • traits: by Enthought Includes a simple way of defining a function to call when a class property changes
  • astor: AST observe/rewrite
  • lib2to3: round tripping src -> AST -> src
  • rope: python refactoring library
  • unparse.py: “unparseing” AST
  • RedBaron: self-modifying python code

Preamble

Initialize directory structure.

mkdir -p {code,images}/workflow

Design

Given a set of python functions

def A(): return 40

def B(): return 2

def C(): return A() + B()

we want to

(A | B) ; C

which means

Evaluate A and B in parallel, wait until both complete, then evaluate C

Approaches

Fundamentally there are two phases to this problem:

  1. creation of the call graph
  2. evaluation of the call graph

The naive solution

The call-graph is essentially the Abstract Syntax Tree of the provided program. In the case of the example above, such a tree could be:

digraph {
        C -> A
        C -> B
}

One approach to buid this in python could be the following:

@task
def A(): return 40

@task
def B(): return 2

@task
def C(): pass
# inspect call graph (an implicit global value) to call and retrieve
# the values of =A()= and =B()=.

evaluate((A || B) && C)

A few problems will arise with this approach:

  1. Expicitly creating the dependency graph will be error prone for anything more than a simple workflow
  2. How will function parameters, which may also be the tips of a dependency tree, be incorporated?

Implicitly building the graph

Rather than explictly building the call-graph, build the graph implictly. Something like this would be ideal as boundary is inferred from the calls to A and B withing C.:

def A(): return 40

def B(): return 2

def C():
    a = A()
    b = B()
    return a + b

print C()

Since this is plain and simple Python code, the question then becomes:

How to build a call-graph of a simple Python expression?

import sys
import subprocess
import networkx as nx
import ast
from textwrap import dedent
from pprint import pprint

code = dedent("""\
from functools import wraps
class task(object):
    def __call__(self, fn):
        @wraps(fn)
        def wrapper(*args, **kws):
            return fn(*args, **kws)
        return wrapper

@task()
def A(): return 40

@task()
def B(): return 2

def C(): print 'Not a task'

@task()
def D():
    a = A()
    b = B()
    C()
    return a + b

print D()
""")


class Visitor(ast.NodeVisitor):
    def __init__(self, amount=2):
        self._indent = 0
        self._amount = amount
        self.G = nx.DiGraph()
        self.task_functions = ['START']
        self.G.add_nodes_from(self.task_functions)
    
    def _inc(self):
        self._indent += self._amount
    
    def _dec(self):
        self._indent -= self._amount
    
    def _print(self, node, extra=''):
        # extra = extra or '(' + ','.join([k for k, _ in ast.iter_fields(node)]) + ')'
        # print '|' + self._indent * '--', node.__class__.__name__ + extra
        pass
    
    def _recurse(self, node):
        self._inc()
        for child in ast.iter_child_nodes(node):
            self.visit(child)
        self._dec()
    
    def nest(self, name):
        # print 'PUSH', self.task_functions, name
        self.task_functions.append(name)
    
    def pop(self):
        # print 'POP', self.task_functions
        if len(self.task_functions) > 1:
            self.task_functions.pop()
    
    def _is_task(self, node):
        assert isinstance(node, ast.FunctionDef), type(node)
        for dec in node.decorator_list:
            return dec.func.id == 'task'
    
    def visit_FunctionDef(self, node):
        if self._is_task(node):
            self._print(node, extra='(name=%s)' % node.name)
            self.G.add_node(node.name)
            self.nest(node.name)
        self._recurse(node)
        self.pop()
    
    def visit_Call(self, node):
        child = node.func.id
        self._print(node, extra='(f=%s)' % child)
        parent = self.task_functions[-1]
        assert parent in self.G, (parent, self.G.nodes())
    
        if child in self.G.nodes():
            self.G.add_edge(parent, child)
            self.nest(child)
        self._recurse(node)
    
        if child in self.G.nodes():
            self.pop()
    
    def generic_visit(self, node):
        # self._print(node)
        self._recurse(node)

tree = ast.parse(code)
v = Visitor()
v.visit(tree)

dotfile = 'code/workflow/prototype_callgraph.dot'
nx.write_dot(v.G, dotfile)
svg = subprocess.check_output(['dot', '-Tsvg', dotfile])
with open('images/workflow/prototype_callgraph.svg', 'w') as fd:
    fd.write(svg)

images/workflow/prototype_callgraph.svg

Dynamic vs Strict call-graph

Operator Overloading

Consider the simplified problem of processing the expression:

(A | B) ; C

Using similar syntax to build the DAG can be done in Python by overloading the bitwise AND and OR operators:

class Node(object):

def __and__(self, other):
    return self.compose(other, AndNode)

def __or__(self, other):
    return self.compose(other, OrNode)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published