LGI Pilot job manager
wvengen/lgipilot
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
LGI Pilot job manager (lgipilot) -------------------------------- To use the Leiden Grid Infrastructure (LGI) on a grid (like gLite) it can be useful to make use of pilot jobs. Most importantly, it eliminates latency inherent in grids. Secondly, end-users can use the grid without the need for grid certificates (though this requires some LGI tweaking at the moment). Requirements: - Python (tested with 2.4) - gLite user-interface - LGI project server version 1.29 or later License: GPL version 2 or later * Quickstart overview 1. Install an LGI project server Please see http://fwnc7003.leidenuniv.nl/LGI/ for more information. This typically includes installing a LAMP stack, the LGI application, setting up users and groups, and configuring a resource daemon. 2. Configure a resource daemon for your application. This is explained by the LGI documentation as well. By running this resource daemon on a Linux host you should be able to use your application from within LGI. 3. Adapt the resource daemon configuration for the grid job. To run the resource daemon on the grid, it needs to be self-contained and a little adapted. This is explained below. 4. Run the LGI pilot job manager (lgipilot) using the resource daemon tarball just created. See below. * Resource daemon configuration As mentioned in the LGI documentation, a resource daemon needs to be configured for use with your specific application. This includes editing LGI.cfg, pointing it to its key and certificate, and creating or editing various scripts for things like starting and stopping the application. To run the application on the grid, the resource daemon, its configuration and the application must be bundled, including any dependencies. This is described in pilotdist/README.pilot. This results in a tarball which should be placed in pilotdist/pilotjob.tar.gz. * Run LGI pilot job manager With the pilot job tarball in place, the LGI pilot job manager (lgipilot) can be started on a grid user-interface (UI, which means that it has all the tools and configuration to submit and interact with grid jobs). Make sure that you have a valid proxy certificate (*). You may want to change the pilot job manager configuration in lgipilot.ini, section 'LGI'. SchedMin/SchedMax depend on your expected load and the amount of jobs slots (fair share) you have available, the other timing-related options are dependent on the typical length of your jobs and grid latency. More guidelines on how to configure this and monitoring tools may become available in the future. The pilot job manager includes parts of Ganga, which has many configuration options as well in the same file. Please see comments and Ganga documentation. (*) For production deployments one can use a robot certificate that updates the proxy certificate like once a day. * Update pilot job tarball It may happen that the pilotjob tarball is modified at some point, for example to update or add an application. New pilot jobs will run using the updated tarball, but old pilot jobs may keep running. And you never know which of the running pilot jobs will be executing a new LGI job. Currently there is only a crude way to make sure that all pilotjobs run the new version: killing the old ones. Running LGI jobs will be rescheduled by the LGI project server upon timeout, so no jobs will be lost. This is not ideal, but currently the only option. One can kill all pilotjobs invoked by lgipilot with the command-line option --pilot-cancel. The LGI pilot job manager will automatically submit new pilotjobs to compensate (running the new version).