GitHub - amirfarhat/machine-learning-flow-sizes

The data files are on cambridge.csail.mit.edu at /home/amirf/amir_superurop/machine-learning-flow-sizes

The cachenet_experiments directory houses tcpdump and iteration measurements conducted on distributed learning of ResNet50, VGG19, and GPT-2 using KungFu.

The cerberus_experiments directory houses tcpdump and iteration measurements conducted on distributed learning of MobileNet, DenseNet121, InceptionV3, ResNet50, and VGG19 using Horovod.

The paper uses data from training experiments done using Horovod as part of the cerberus_experiments.

This repository does not directly contain processed data files used for plotting. We favored instead to open source of data collection and processing scripts to enable others to replicate the experiment themselves. Each experiment and model has a run_steps.sh bash script which descibres the precise experiment to be run on Google Cloud GPU servers. All details about these machines and the arrangements can be found in the paper.

For any questions, please feel free to reach out to amirf at mit.edu :)

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cachenet_experiments		cachenet_experiments
cerberus_experiments		cerberus_experiments
obsolete		obsolete
setup		setup
.gitignore		.gitignore
README.md		README.md
analyze_iterations.py		analyze_iterations.py
analyze_old_iterations.py		analyze_old_iterations.py
binnify_packets.py		binnify_packets.py
flow_cdf_plot.py		flow_cdf_plot.py
hvd_analyze_iterations.py		hvd_analyze_iterations.py
hvd_plot_timestamps.py		hvd_plot_timestamps.py
hvd_read_simple.py		hvd_read_simple.py
nvlink_measure.py		nvlink_measure.py
plot_cdf.py		plot_cdf.py
plot_compare_cdf_from_csv.py		plot_compare_cdf_from_csv.py
plot_final_cdf.py		plot_final_cdf.py
plot_from_summary_csv.py		plot_from_summary_csv.py
plot_timestamps.py		plot_timestamps.py
produce_paper_graphs.py		produce_paper_graphs.py
read_simple.py		read_simple.py
read_tcptrace.py		read_tcptrace.py
summarize_iterations_and_flow_size.py		summarize_iterations_and_flow_size.py
utils.py		utils.py
write_final_cdfs.py		write_final_cdfs.py
write_flow_cdf.py		write_flow_cdf.py

amirfarhat/machine-learning-flow-sizes

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages