Evaluation of scheduling algorithms

The goal of this project is to evaluate and compare scheduling algorithms for the P||Cmax problem.
First, job cost instances are either randomly generated according to distribution functions such as the uniform function, the gamma function, and the beta function, or retrieved from real job logs, from a Parallel Worklaod Archive. Then these instances are subjected to the LPT, SLACK, LDM and COMBINE algorithms.

project status

Documents		implementation
itsefl	:	done
user's manual	:	not started
report	:	in progress

Algorithm		implementation
LPT	:	functional
SLACK	:	functional
LDM	:	functional
COMBINE	:	functional

instances generation		implementation	with seed management
uniform	:	functional	yes
non-uniform	:	functional	yes
lambda	:	functional	yes
beta	:	functional	yes
exponential	:	functional	yes
Parallel Workload Archive	:	functional	no

Module		implementation
result csv file export	:	functional
results statistics gui	:	in progress

Prerequisites

The project uses a number of libraries, plug-ins, editors and other technologies that must be installed beforehand. The programs are tested and work with the announced versions (e.g. Python 3.4)
The installation procedure is given for a Linux environment using the APT package manager (more specifically Debian 8.11). you will have to adapt the installation commands to the target environment.

List of technologies used within the project:

first and foremost :

$ sudo apt-get update

Python: Version 3.4

$ sudo apt-get install python3.4
$ sudo apt-get install python3.4-minimal
$ sudo apt-get install idle-python3.4

pandas library: Version 0.14.1-2

$ sudo apt-get install python3-pandas 
$ sudo apt-get install python3-pandas-lib

R Langage: 3.5.3.1

$ sudo apt-get install r-base 
$ sudo apt-get install r-base-dev

rStudio: Version 1.1.463
- follow this link
  - download the appropriate version
  - OR, for Debian 8.11 environment, follow the older versions link :
    and download the rstudio-1.1.463-amd64.deb file
- right click on the .deb file
- installation program apper ¹
ggplot2 / readr / dplyr (RStudio version must be 3.5 or higher)
- in the rStudio editor, go to the menu :
  - Tools --> Install packages
- in the "packages" area, enter the following module names (separated by a space)
  - ggplot2 readr dplyr
- press the "install" button.
git to retrieve the project from GitHub

$ sudo apt-get install git

Installation

(in progress)
The project is composed of PYTHON scripts. You just have to install them from github, as well as the modules they use (see Prerequisites ).
As mentioned above, this project works in a Linux environment. you will have to adapt the OS commands to the target environment. And also, the paths and directories used must be modified if the scripts are executed under windows (replace the / characters in \). see below.

get the scripts from github

in a linux console from the local home directory (/home/xxxx)

$ git clone https://github.com/fcolasCTU/appCmax.git
$ cd appCmax

adapt directory management to windows
- edit the setup.py script with idle
  - find the following command line

	#=========================================
	# OS Name
	# Values LINUX
	#        WINDOWS
	#=========================================
	OS_Name = "LINUX"

	- Replace this with

	#=========================================
	# OS Name
	# Values LINUX
	#        WINDOWS
	#=========================================
	OS_Name = "WINDOWS"

Usage

once the programs are retrieved from github,
go to the appCmax directory
Open the script exeParam.py

exeParam.py

Description

this one is divided into two distinct parts:

part # 1.PARAMETERS TO BE MODIFIED

which proposes a series of parameters assignment which will have an impact on the generation of the instances, and the name of the directory which will receive the final result.

Information about the test campaign :

campaignName : Name of the campaign
campaignUser : Name of the user

the directory of the final result ==>
./Results/[campaignName]_[campaignUser]_[ddmmyyyy]
seedForce : None (see below)

Information about the size n of the Pi sets (number of task sizes) :
Either with a start number and an end number.

N_NumberBegin: number of starting tasks
N_NumberEnd : number of ending tasks e.g. if N_NumberBegin = 10 and N_NumberEnd=15, exeParam will create sets of 10 tasks, then 11 tasks, 12 tasks ... and finally 15 tasks. These parameters are only used if N_List = [] (empty set) On the other hand if N_List is filled, exeParam will use this parameter and will ignore N_NumberBegin and N_NumberEnd
N_List : List of task numbers e.g. if N_List = [10, 50, 100, 1000], exeParam will generate 4 lists of tasks, one of 10 tasks, one of 50 tasks, one of 100 tasks and one of 1000 tasks.

Information on the number of machines m :
(or processors) Works in the same way as the information on the size of the Pi sets.

M_NumberBegin : number of starting machines
M_NumberEnd : number of ending machines or
M_List = [m1, ....mj]
Note
For each number of tasks, and number of machines parameterized, exeParam will create two sets of tasks. A "native" one with a requested number of tasks, and a completed one with m-1 tasks, which allows to control the optimal solution. In the first case m set to calculate the average load per machine, in the second case, to build an instance of which we know the optimal solution.

Instance generation information :
There are several ways to randomly generate lists of numbers. In particular by using statistical distributions (Uniform, Gamma, Beta, Exponential ...). These parameters ask how many of these lists of tasks should be generated according to the type of distribution desired:

matUniformNumber: How many lists with a uniform distribution to generate
matNonUniformNumber : How many lists to generate with a non-uniform distribution
matGammaNumber : How many lists to generate with a Gamma distribution
matBetaNumber : How many lists to generate with a Beta distribution
matExponentialNumber : How many lists to generate with an Exponential distribution

Use of real job logs (Parallel Worload Archive)
You can also use real logs, downloaded from the Parallel Worload Archive site (see below for downloading these files). In this case, the number of tasks is not controlled.

matRealFiles = pwa.pwaFileChoice(X): How to work with the already downloaded files.
- if X = None, exeParam asks file by file, which one to use
- if X = 0 : no file will be used
- if X = 1: All files (present / already downloaded) will be used

Distribution parameters :

nAb and nBb are used to generate lists using uniform and non-uniform distributions.
For uniform instances, the tasks are uniformly distributed numbers in the range [nAb ,nBb].
For non-uniform instances, 98% of the tasks are uniformly distributed numbers in the range [0.9(nBb - nAb]), nBb] and
the rest uniformly distributed in the range [nAb, 0.2(nBb - nAb)].
nAlpah : is used as a parameter of the Gamma and Beta distributions
nBeta: is used as a parameter of the Gamma distribution. (yes Gamma)
nLambda : is used as a parameter of the Exponential distribution.

Which algorithms to use :
if 0, the algorithm is not used on the generated instances.
1, the algorithm is used

useLPT 1 or 0
useSLACK 1 or 0
useLDM 1 or 0
useCOMBINE 1 or 0
useMULTIFIT 1 or 0

part # 2. APPLICATION PART

this part is application, and uses the parameters previously entered.

Execution

It only remains to execute exeParam.py (F5 key if opened with IDLE)
exeParam will

generate all the requested instances.
Note (seedForce):
Each instance is generated at the same time as a seed. Each instance has its own seed. To regenerate the same instance, you have to change the value of seedForce (= None) with the seed number indicated in the result file.
Complete all native instances at m-1 tasks.
Calculate indicators of these instances (means, variance, lower bound...)
Run each chosen algorithm on each instance (native and completed), and enter the result found (Cmax)
create the result directory
generate a json file of the chosen parameters
generate for each instance a task file
generate the result file result.csv
retrieve the r scripts (from the analysis directory) in this directory, execute them on the result.csv file, and store, in this same directory, the result of the scripts (PDF for the graphs)

pwaRetrieve.py

In the appCmax directory, the script pwaRetrieve.py allows to download job log files from real jobs, downloadable from the Parallel Worload Archive site .

At runtime, pwaRetrieve asks how many files should be downloaded. pwaRetrieve stores them in compressed form in the gz directory, and decompresses them in the log directory.

apper must also be installed, or use another installation program ↩

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
analysis		analysis
protocolDellaCroce		protocolDellaCroce
README.md		README.md
ScheduleManagment.py		ScheduleManagment.py
algorithms.py		algorithms.py
campaign.py		campaign.py
exeInput.py		exeInput.py
exeLoad.py		exeLoad.py
exeLoadInstance.py		exeLoadInstance.py
exeParam.py		exeParam.py
exeParamDC.py		exeParamDC.py
matrix.py		matrix.py
pwa.py		pwa.py
pwaRetrieve.py		pwaRetrieve.py
setup.py		setup.py

fColas68/appCmax

Folders and files

Latest commit

History

Repository files navigation

Evaluation of scheduling algorithms

Table of Contents

General Info

Evaluation of scheduling algorithms.

project status

Prerequisites

Installation

Usage

exeParam.py

Description

part # 1.PARAMETERS TO BE MODIFIED

part # 2. APPLICATION PART

Execution

pwaRetrieve.py

Footnotes

About

Resources

Stars

Watchers

Forks

Languages