Measuring scatter of task executions across diverse distributions of resources.
Related paper at: https://bitbucket.org/shantenujha/aimes
- Prerequisites: Python 2.7; pip; git; radical-pilot
- Clone this repository:
git clone https://github.com/radical-experiments/AIMES-Experience.git
- Install RADICAL Cybertools:
virtualenv ~/ve/aimes-experience
. ~/ve/aimes-experience/bin/activate
git clone git@github.com:radical-cybertools/radical.pilot.git
cd radical.pilot; git checkout experiment/aimes; git pull; pip install --upgrade . ; cd ..
git clone git@github.com:radical-cybertools/radical.utils.git
cd radical.utils; git checkout experiment/aimes; git pull; pip install --upgrade . ; cd ..
git clone git@github.com:radical-cybertools/saga-python.git
cd saga-python; git checkout experiment/aimes; git pull; pip install --upgrade . ; cd ..
- Move into the
AIMES-Experience
directory. - Edit the file
experiment.py
setting the following global variables to their appropriate value:
N_UNITS = 2048
U_CORES = 1
U_TIME = 15
RESOURCE = 'xsede.comet'
N_PILOTS = 4
P_CORES = 512
P_WALLTIME = 75
PROJECT = 'TG-XXXXXXXXX'
Note: SSH key-based, passwordless access to the choosen resource(s) is required.
- Set up your execution environment:
. setup.sh
- run the experiment:
python experiment.py
- Download session for the experiment:
radicalpilot-close-session -m export -d mongodb://54.221.194.147:24242/aimes-experience -s rp.session.xxxx.xxxx.xxxx.xxxx.xxxx
- Upon success of the previous command, create a directory
runn
wheren
uniquely and incrementally indicates the number of the experiment. - Create a file inside
runn
calledmetadata.json
with the following information:
{
"n_tasks": <int>,
"n_cores": <int>,
"pilots": [
[<int n_cores>, <walltime>],
...
],
"resources": [
"resource.tag",
...
],
"cores": [
[<int tasks>, <int n_cores>],
...
],
"durations": [
[<int tasks>, <int duration>],
...
]
}
Example:
{
"n_tasks": 2048,
"n_cores": 2048,
"pilots": [
[512, 75],
[512, 75],
[512, 75],
[512, 75]
],
"resources": [
"xsede.comet"
],
"cores": [
[2048, 1]
],
"durations": [
[2048, 15]
]
}
Note:
- Durations are in minutes.
"cores"
and"durations"
are used to describe partions of the set of tasks. At the moment, we use just 1 core and 15 minutes duration for each task but we will have to use more complex distributions or cores and durations.
- Copy the .prof, .json, and log file into the
runn
directory:
cp rp.session.xxxx.xxxx.xxxx.xxxx.xxxx.prof rp.session.xxxx.xxxx.xxxx.xxxx.xxxx.json logs/radical_debug.log runn/
- Pull and push the repository.