Workflow framework for DAMPE remote computing & accounting
external contributor: Stephane Poss @sposs on github (lots of help in packaging & dealing with requests as well as for the initial suggestion to using flask/mongodb)
it's advisable to install virtualenv to handle the different python installations
easy_install virtualenv OR
pip install virtualenv
again, it's advisable to use a virtual environment.
-
set DAMPE as virtualenv:
mkvirtualenv DAMPE
-
get tarball:
wget --no-check-certificate https://dampevm3.unige.ch/dmpworkflow/releases/DmpWorkflow-0.0.1.dev247.tar.gz
-
install tarball:
pip install DmpWorkflow-0.0.1.dev247.tar.gz
-
set configuration file:
cdsitepackages nano DmpWorkflow/config/defaults.cfg
-
enjoy!
Job definition is done in Xml markup
<Jobs>
<Job>
<InputFiles>
<File source="" target="" file_type="" />
</InputFiles>
<OutputFiles>
<File source="" target="" file_type="" />
</OutputFiles>
<JobWrapper executable="/bin/bash"><![CDATA[
#/bin/bash
echo hostname
]]>
</JobWrapper>
<MetaData>
<Var name="" value="" var_type="string"/>
</MetaData>
</Job>
</Jobs>
Note that there are a few reserved metadata variables:
- BATCH_OVERRIDE_REQUIREMENTS - will override whatever BATCH_REQUIREMENTS are defined in settings.cfg
- BATCH_OVERRIDE_EXTRAS - complements requirements
- BATCH_OVERRIDE_QUEUE - the queue to be used, overrides BATCH_QUEUE
- BATCH_OVERRIDE_SYSTEM - shouldn't be used. These variables can be used to control the submission behavior for each batch job.
first comes JobInstance, overrides anything that is defined at Job level. if neither instance nor job provide variables, top level variables are inherited from depedent parent task (if defined)
for details, send email to zimmer_at_cern.ch. To extend the existing system, a remote access to SITE_A (our remote site) is necessary. Test the ability to connect to the DB by submitting heart-beat requests. Provided the site can serve requests through HTTP to either PROD or DEVEL instances, create a new, empty config.file which will become your site-configuration for SITE_A.
[global]
installation = client
randomSeed = true
trackeback = true
# or set seed here.
[server]
# use DEVEL server for now
url = http://url_to_flask_server
# here we have model definitions specific to the collections
[JobDB]
task_types = Generation,Digitization,Reconstruction,User,Other,Data,SimuDigi,Reco
task_major_statii = New,Running,Failed,Terminated,Done,Submitted,Suspended
task_final_statii = Terminated,Failed,Done
batch_sites = CNAF,local,UNIGE,BARI
[site]
name = SITE_A
DAMPE_SW_DIR = /lustrehome/exp_soft/dampe_local/dampe
EXEC_DIR_ROOT = /tmp/condor/
ExternalsScript = ${DAMPE_SW_DIR}/externals/setup.sh
workdir = /lustre/dampe/workflow/workdir
#workdir = /storage/gpfs_ams/dampe/users/dampe_prod/test
HPCsystem = condor # or lsf, sge / pbs
HPCmemory = 4000
HPCcputime = 01:00
# use HPCextra to specify the universe for condor
HPCname = HPC_Site_A
HPCextra = site_condor_name
[watchdog]
ratio_mem = 0.95
ratio_cpu = 0.98
save your changes and proceed to download the client. Once done, load releavnt configuration using dampe-cli-configure -f <file/to/config>.
dampe-cli-configure -f site_A.cfg
The only other part is to start the fetcher as daemon, e.g. through an infinite loop (perhaps inside a screen session):
while true; do dampe-cli-fetch-new-jobs -c 20 -m 800; sleep 20; done
Also, make sure to add SITE_A to the configuration file for your servers (vm4/vm6)