Skip to content

A higher-level API for ISI Pegasus, adapted to the quirks of the ISI Vista group

License

Notifications You must be signed in to change notification settings

gabbard/vista-pegasus-wrapper

 
 

Repository files navigation

Build status codecov

Documentation

To generate documentation:

cd docs
make html

The docs will be under docs/_build/html

Project Setup

  1. Create a Python 3.6 Anaconda environment (or your favorite other means of creating a virtual environment): conda create --name pegasus-wrapper python=3.6 followed by conda activate pegasus-wrapper.
  2. pip install -r requirements.txt

Usage

A workflow is an organized logical path taking input data and running predefined processes over the data to produce a structured output. Pegasus is an ISI developed workflow management system which is used to manage the submission and execution of these workflows. Pegasus functions on a directed graph structure (a DAX) to manage the job dependency, file transfer between computational resources, and execution environments. This library simplifies the process of writing a profile which can be converted into a DAX for submission to the Pegasus controller.

Using WorkflowBuilder from workflow.py develop a function to generate a Workflow.dax. See example_workflow for an extremely simple workflow which we will use to demonstrate the process. To see the example workflow add a root.params file to the parameters directory with the following: Note the Directory should be in your $Home and not a NFS like /nas/gaia/ as the submission will fail for an NFS reason

example_root_dir: "path/to/output/dir/"

run python -m pegasus_wrapper.example_workflow_builder parameters/root.params from this project's root folder.

The log output will provide you the output location of the Text.dax Assuming you are logged into a submit node with an active Pegasus install:

cd "path/to/output/dir"
pegasus-plan --conf pegasus.conf --dax Test.dax --dir "path/to/output/dir" --relative-dir exampleRun-001
pegasus-run "path/to/output/dir/"exampleRun-001

The example workflow submits ONLY to scavenge. In an actual workflow we would recommend parameterizing it.

Our current system places ckpt files to indicate that a job has finished in the event the DAX needs to be generated again to fix a bug after an issue was found. This system is non-comprehensive as it currently requires manual control. When submitting a new job using previous handles use a new relative dir in the plan and run.

A Nuke Checkpoints script is provided for ease of removing checkpoint files. To use, pass a directory location as the launch parameter and the script will remove checkpoint files from the directory and all sub-directories.

FAQ

What are valid root directories for the workflow?

Currently the root directory should be be in your home directory and not on an NAS like /nas/gaia/ as the submission will fail for an NFS reason. The experiment directory can be (and ought to be) on such a drive, though.

Common Errors

Mismatching partition selection and max walltime

Partitions each have a max walltime associated with them. See the saga cluster wiki here. If you specify a partition with a job_time_in_minutes greater than that partition's max walltime, you will see an error.

Error parsing classad or job not found

An annoying error that says very little about what might be going on. In the past, has been associated with the workflow requesting too little memory resulting in an OUT_OF_ME+ on SLURM. See Debugging Tricks/Tips for ways to confirm this.

Contributing

Run make precommit before commiting.

If you are using PyCharm, please set your docstring format to "Google" and your unit test runner to "PyTest" in Preferences | Tools | Python Integrated Tools.

Debugging Tricks/Tips

  • Investigate reasons for why a job is in a certain state (like being held for way too long): condor_q -long [job_id]. You can get the job id by pegasus-status -v [wfdir].

About

A higher-level API for ISI Pegasus, adapted to the quirks of the ISI Vista group

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • Makefile 2.8%