Skip to content

adehecq/make_workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

make_workflow

Python and bash utilities to generate a workflow of file processing based on GNU make. Many processing workflows involve reading and writing files with a set of commands, that must be done in a given order and some dependencies. This can be done easily within Python or Bash, but what if the processing stops, if you update some intermediate files and only want to start from there again, or if you want to run some steps in parallel... These tools provide a solution thanks to GNU Make.

Example

Let's consider the following workflow where 3 files are created:

echo foo > hello1
sed s/foo/faa/ hello1 > hello2
echo bar > hello3

We can summarize it as:
-> hello1 -> hello2
-> hello3
There are two independant chains, no file is required to create hello1 and hello3, but hello1 is required for hello2.

Python implementation

The Python implementation with make_workflow is simple and saved in the repo under test.py:

import make_workflow as mw

# Initialize workflow. By default a temporary file is generated, but a path can be set here.
wf = mw.Workflow()

# Create some text file
# The append function takes as arguments a string/list of the commands, inputs and outputs.
wf.append("echo foo > hello1", "", "hello1")

# Use first file to create 2nd file
wf.append("sed 's/foo/faa/' hello1 > hello2", "hello1", "hello2")

# Create another file that does not require hello1 or 2
wf.append("echo bar > hello3", "", "hello3")

# Finaly run workflow
wf.run(njobs=1)

Let's save all this in test.py and see what happens:

python test.py

+echo foo > hello1
+sed s/foo/faa/ hello1 > hello2
+echo bar > hello3

Since no file exist yet, all commands are ran and the files created. If we run the command again... nothing happens as the files exist.

Let's then remove some files to see what happens.

rm hello3
python test.py

+echo bar > hello3

This time hello3 is missing, hence the last command is re-run automatically. The result would be similar with hello2 removed.

rm hello1
python test.py

+echo foo > hello1
+sed s/foo/faa/ hello1 > hello2

If hello1 is removed, the first two commands are re-run automatically. The same thing would happen if hello1 was updated, with e.g. touch hello1.

Finally, this can be used to run independant processes in parallel. here e.g., hello2 and hello3 can be processed at the same time by replacing the last line with:

wf.run(njobs=2)

By default, the makefile is saved in a temporary file and deleted at the end. It can be printed with the command wf.display(), or it can be saved at a given location at creation: wf = mw.Workflow('Makefile').

Bash implementation

Similar, but less advanced tools exist for Bash. An example is shown in test.sh.

#!/usr/bin/bash

source make_workflow.sh

# Initialize workflow
makefile_init Makefile "hello2 hello3" "Test flow"

# Create some text file
# The makefile_append function takes as arguments a string/list of the commands, inputs and outputs.
makefile_append Makefile "Hello1" hello1 "" "echo foo > hello1"

# Use first file to create 2nd file
makefile_append Makefile "Hello2" hello2 hello1 "sed 's/foo/faa/' hello1 > hello2"

# Create another file that does not require hello1 or 2
makefile_append Makefile "Hello3" hello3 "" "echo bar > hello3"

# Finaly run teh workflow
make -f Makefile

Troubleshooting

Warning : These tools depend on make, therefore some issues could arise with different versions of make... Please report any issue.

Windows users : I haven't tested, but the creation of temporary file on Windows might cause issues. In that case, simply add a path to the makefile like this wf = mw.Workflow('Makefile').

About

Python and bash utilities to generate a workflow of file processing based on GNU make

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published