Benchmarks for my thesis and optimized Riemann solvers.
Note: This manual and the scripts and configuration files of this repository described therein were written for use on the TACC Stampede Supercomputer using the Intel ifort Fortran compiler, version 15.0.2, and might have to be changed in order to run on your local machine.
Tests were run with the following configurations:
- Vanilla code FFLAGS:
-O2 -ipo [-pg|-DUSEPAPI]
, LFLAPGS:-qopenmp-stubs [-pg|-lpapi]
- Optimized (SoA) code: FFLAGS:
-O2 -ipo -align array32byte -qopenmp-simd -xavx [-pg|-DUSEPAPI]
, LFLAGS:-qopenmp-stubs [-pg|-lpapi]
Test results are available in my thesis. Unfortunately, the data is too big to upload here, but the tests can be re-run following the instructions below.
The bash scriptcreate_jobs.sh
can be used to create various builds with different run settings and execute them as job on Stampede.
Therefore, some variables will have to be set, which will be explained in the following. After setting up the file, it can be executed with ./create_jobs.sh [all|runs|make]
Usually all
is the right choice. It attempts to build a binary with the compilation and linker flags passed,
overrides the grid size and the AMR levels in the setrun.py file and submits a job which is executed within a directory with the following based on the following pattern:
run_${NAME}${res}${flagstring}
, where "NAME" is the given name (should include scenario name, for example),
"res" is the resolution set for x/y dimensions (multiple possible) and "flagstring" are the (trimmed) compilation flags. For example,
a gprof run for a SoA Chile scenario (with the respective compilation flags) and NAME=chile_soa for a 300x300 grid size would be stored in
the directory run_chile_soa_300_O2_ipo_align_array32byte_qopenmp_simd_xavx_pg
- Code, depending on what should be measured:
rpn2_geoclaw.f90, rpt2_geoclaw.f90, flux2_fw.f90, amr2.f90
- undersrc
directory.- For example, if you want to measure the normal Riemann solver FLOPS etc. with PAPI,
make sure that the subroutine
flux2
surrounds the call torpn2
with apapi_start()
/papi_stop(mx)
pair. Additionally, for output purposes you should ensure that the call topapi_summary
in the end of the program routineamr2
has the correct name passed as parameter.
- For example, if you want to measure the normal Riemann solver FLOPS etc. with PAPI,
make sure that the subroutine
- job.sh - If running with the
create_jobs.sh
, only the time and node is relvant. Node should be normal in this case, as development only allows one job at a time. Otherwise, configure the job file as explained here - Makefile - For automated jobs, only changing the
LFLAGS
is relevant. Add-lpapi
if using PAPI, or-pg
for obtaining a gprof output. Of course you can use both simultaneously, but for better results it was chosen to only enable one option at a time. If PAPI is not used, the module$(CLAWUTILS)/src/papi_module.f90 \
should be commented out as otherwise linker errors can occur. - create_jobs.sh - This is the most important file as here the following parameters are set:
- NAME - This is the base name for the runs of this test. This is important to set as all the runs will be stored in directories with name
run_$NAME_$RESOLUTION_$FLAGS
. - FLAGS - This overrides the compiler flags in the Makefile. See what flags are used for the vanilla/vector run above. Note that for gprof or PAPI runs,
-pg
or-DUSEPAPI
must be used as compiler option, respectively.
- NAME - This is the base name for the runs of this test. This is important to set as all the runs will be stored in directories with name
./create_jobs.sh all
.
If the directory for the build already exists (same compilation flags, doesn't detect code changes!),
you are asked to overwrite the build.
The runs are stored in the $WORK
filesystem (see this link for a description), however a symlink to the directory is set under ./runs/
The dry or wet scenarios can be found in sl_bowl_radial
First of all, you must decide whether you want to execute the Vanilla version or the optimized version. This is done by executing set_soa_vanilla.sh <vanilla|soa>
,
which sets symlinks for the Makefile and job file.
In addition to the aformentioned settings, you must consider the following settings for these scenarios:
- maketopo.py - In this Python file, the initial water height is set. The function
qinit(x,y)
describes the intial distribution for the dry case and the radial water hump, respectively. Just comment in/out accordingly for dry or wet scenario, respectively. - amrlevels in
create_jobs.sh
- Additionally, the AMR level can be set. For the wet/dry scenario it was set to 1. (no AMR)
- The SoA version is in
soa_step2
- For the Vanilla version can be found in
vanilla_papi
(it's called "papi" but of course gprof runs can also be excuted with this version)
- amrlevels in
create_jobs.sh
- Additionally, the AMR level can be set. For the Chile scenario it was set to either 1 for testing on different uniform grid sizes (no AMR), and for the AMR run level 3 refinement was used.