Skip to content

fubarwrangler/group-quota

Repository files navigation

Accounging-Group Quota management for HTCondor

An HTCondor Account Group manager, used by the US ATLAS Teir-1 at BNL.

Background

HTCondor Hierarchical Group Quotas are used to partition the pool logically into queues, with each sub group containing a single species of job. The groups are organized into a tree, with leaf-nodes having jobs submitted into them and the higher levels serving to classify these jobs.

For example, the ATLAS tree is partitioned into production and analysis, with each of these containing a few different queues of each type.

This repo contains code for a website to manage the groups, some scripts to put the groups in the configuration of the 'condor_negotiator', and software to auto-balance the groups based on which have demand in them and what size jobs each group has.

Installation

The software needs some packages built via the setup scripts in this repo, a webserver running Apache (nginx would work too), and a MySQL database somewhere (relatively low traffic).

All parts rely on the 'gq' RPM package, built by setup-gq.py. This contains code used by the website, the balancing software, and the other scripts.

Example installation steps

  1. Set up database and web servers

  2. Build & install the gq and gqweb packages via the setup scripts

  3. Run the init_db.py script to define the tables from the website models

  4. Configure the webpage settings, esp. the SECRET_KEY, and figure out what kind of authentication you want to set up. It is loaded in the site code in gqweb/views/pre_initalize.py which calls 'load_user_header()' from util/userload.py. By default is uses the REMOTE_USER environment variable so some kind of BasicAuth should be easy to set up

Balancing & Idle Job Data

The balancing scripts require knowledge about how much workload is in each queue. To provide this data you need to set up software like that in gq/jobquery/, where there are modules contain a get_jobs() method that returns a mapping of group_name → # idle at the time it is called. See 'pandajobs.py' for an example that looks to PANDA for activated jobs.

The balancing scripts themselves are bin/get_idle_jobs.py and bin/balance_load.py, both of which should be called via Cron or some other regular mechanism.

Configuration Generation

The script update_quotas.py safely write a config file that contains the current groups and their parameters, and triggers a reconfig of condor if things changed.

The script is bin/update_quotas.py, and it too requires the 'gq' package. It should be run via Cron wherever the 'condor_negotiator' for your pool runs.

About

HTCondor Accounting Group Software

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published