Skip to content

ComPerf: Comparative Parallel Performance Characterisation

Notifications You must be signed in to change notification settings

Richard549/ComPerf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ComPerf

This repository contains code relevant to ComPerf: Comparative Performance Analysis via Empirical Modelling (publication forthcoming), which is related to our earlier work:

R. Neill, A. Drebes and A. Pop, "Automated Analysis of Task-Parallel Execution Behavior Via Artificial Neural Networks," 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, 2018, pp. 647-656, doi: 10.1109/IPDPSW.2018.00105.

ComPerf implements a comparative technique for post-mortem parallel performance analysis, and aims to automate the identification of dominating features between different parallel workloads, thereby characterizing their observed performance variations. To do this, predictive artificial neural networks are trained to (empirically) capture the complex interactions within profiling data from parallel executions. Recent advances in artificial neural network interpretability (namely, DeepLIFT) to determine the variations in program and system features that effect the greatest impact on the observed performance variations.

Dependencies

The repository requires Python3 running:

While not a dependency, we used the Aftermath tracing infrastructure for fine-grained instrumentation and profiling for parallel executions of OpenStream and OpenMP.

Usage

ComPerf operates on a working directory that contains a config.py file, which defines the experiment configuration. An example experiment folder and config.py is provided in the examples/ directory. A single run of ComPerf (via ComperfRunner.py) models and analyses the profiling data specific to a particular experiment repeat and K-Fold index (with reference to K-Fold Cross Validation). The experiment directory's results folder will then contain the results for the requested repeat and data partitioning, which are appended to as the user runs additional repeats and k-folds.

The profiling data must be supplied as a delimited flat file, with rows that represent examples and a header that identifies each feature/column. Multiple configurations can be targeted, supplied in the config.py via the get_configurations() function. These configurations (returned as strings) should help identify the filenames to load for each configuration. With each configuration executed multiple times to produce multiple datasets for each configuration, the filename's integer suffix must be used to identify the repeat. For the example config.py, the profiling dataset for the 5th repeat of the tile size configuration (i,j,k) = (8,32,8) is:

/home/rneill/workspace/data/matmul_datasets/matmul_tiled_8_32_8_4.csv

Here, matmul_tiled_ is the end of the dataset prefix, the 8_32_8 string is the configuration identifier, while 4 is the zero-indexed identifier for the 5th repeat. Finally, .csv is the datasets suffix. These can be found in config.py, and the dataset itself has been included in examples directory for reference.

Passing -h to the runner provides the usage instructions:

usage: ComperfRunner.py [-h] -f EXPERIMENT_FOLDER -r EXPERIMENT_REPEAT_INDEX
												-i K_FOLD_IDX [-d LOG_LEVEL] [--reload] [--tee]

required arguments:
	-f EXPERIMENT_FOLDER, --experiment_folder EXPERIMENT_FOLDER
			Experiment base folder (containing config.py).
	-r EXPERIMENT_REPEAT_INDEX, --experiment_repeat_index EXPERIMENT_REPEAT_INDEX
			Assuming repeating the modelling and analysis multiple
			times, provide the repeat index (to get the correct
			train/val/test partitions).
	-i K_FOLD_IDX, --k_fold_idx K_FOLD_IDX
			For the repeat index, what k-fold index are we using
			in this run?

optional arguments:
	-h, --help            
			show this help message and exit
	-d LOG_LEVEL, --log_level LOG_LEVEL
			Logging level. Options are:1=INFO, 2=DEBUG, 3=TRACE.
	--reload            
			Reload the dataset, even if a serialised version
			already exists in the experiment folder.
	--tee                 
			Pipe logging messages to stdout as well as the log
			file.

About

ComPerf: Comparative Parallel Performance Characterisation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages