How to Reproduce Our Experimental Results?

All of our experiments can be automatically reproduced by calling a few pre-prepared scripts.

Prerequisite

cmake > 3.10.0.
Profiling only supports Intel CPUs.
You may need to reset pmu once at first run, some pcm profiling results may be incorrect at first time.
Prepare cpu-mapping in cpu-mapping.txt.
real world datasets will be moved to the exp_dir/datasets automatically by scripts.
Our program should be running as root with two params exp_dir and l3_cache_size: sudo bash run_all.sh -d /data1/xtra -c 19922944.
You can run any subset of the experiment sections by modifying the exp_section in run_all.sh.

Third-party Lib (will be automatically installed in scripts)

cmake install

sudo apt install -y cmake

tex font rendering:

sudo apt install -y texlive-fonts-recommended texlive-fonts-extra
sudo apt install -y dvipng
sudo apt install -y font-manager
sudo apt install -y cm-super

python3:

sudo apt install -y python3
sudo apt install -y python3-pip
pip3 install numpy
pip3 install matplotlib

NUMA library

sudo apt install -y libnuma-dev

Zlib

sudo apt install -y zlib1g-dev

python-tk

sudo apt install -y python-tk

perf

sudo apt install -y linux-tools-common
sudo apt install -y linux-tools-`uname -r` # XXX is the kernel version of your linux, use uname -r to check it. e.g. 4.15.0-91-generic
sudo echo -1 > /proc/sys/kernel/perf_event_paranoid # if permission denied, try to run this at root user.
sudo modprobe msr

Configurations

Default parameters:

Parameters	Default	Description
exp_dir	/data1/xtra	Path to save all results and generate figures
L3_CACHE_SIZE	19922944 (19MB)	Size of l3 cache
Experiment Sections	All (e.g. APP_BENCH)	All experiments shown in our paper

Datasets

We have 4 real datasets that are compressed in datasets.tar.gz. Download and call tar -zvxf datasets.tar.gz to unzip those datasets.

We extracted the useful columns of those datasets, the one is joined key and another is timestamp.

DEBS:

comments_key32_partitioned.csv: user_id|comments_payload

posts_key32_partitioned.csv: user_id|posts_payload

YSB:

ad_events.txt: campaign_id|timestamp

campaigns_id.txt: campaign_id|campaign_payload

Rovio:

1000ms_1t.txt: combined_id|payload|price|timestamp

Stock:

cj_1000ms_1t.txt: stockid|timestamp

sb_1000ms_1t.txt: stockid|timestamp

Results

All results are in exp_dir/results/.

All figures are in exp_dir/results/figures.

Name		Name	Last commit message	Last commit date
Latest commit History 758 Commits
.github/workflows		.github/workflows
docs		docs
hashing		hashing
sorting		sorting
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile.remote-cpp-env		Dockerfile.remote-cpp-env
README.md		README.md
cpu-mapping.txt		cpu-mapping.txt
pcm-uarch.cfg		pcm-uarch.cfg
pcm-uarch2.cfg		pcm-uarch2.cfg
pcm-uarch3.cfg		pcm-uarch3.cfg
pcm.cfg		pcm.cfg
pcm2.cfg		pcm2.cfg
pre.sh		pre.sh
run_all.sh		run_all.sh

JamesNolan17/AllianceDB_Inter_Window_Join

Folders and files

Latest commit

History

Repository files navigation

How to Reproduce Our Experimental Results?

Prerequisite

Third-party Lib (will be automatically installed in scripts)

Configurations

Datasets

Results

About

Resources

Stars

Watchers

Forks

Languages