MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Introduction

This repository contains the codes of the MG-WFBP (Merged-Gradient Wait-Free BackPropagation) paper submitted to IEEE TPDS. This version works well with PyTorch, and a preliminary version was presented at the conference IEEE INFOCOM 2019, which was originally implemented on B-Caffe: https://github.com/hclhkbu/B-Caffe. As PyTorch becomes much popular than Caffe, you are recommended to use this repository with the MG-WFBP algorithm.

Installation

Prerequisites

Python 2 or 3
PyTorch-0.4.+
OpenMPI-4.0.+
Horovod-0.14.+

Quick Start

git clone https://github.com/HKBU-HPML/MG-WFBP.git
cd MG-WFBP
pip install -r requirements.txt
dnn=resnet20 nworkers=4 ./dist_mpi.sh

Assume that you have 4 GPUs on a single node and everything works well, you will see that there are 4 workers running at a single node training the ResNet-20 model with the Cifar-10 data set using the MG-WFBP algorithm.

Papers

S. Shi, X.-W. Chu, and B. Li, “MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning,” Under review (Extension of the following conference version).
S. Shi, X.-W. Chu, and B. Li, “MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms,” IEEE INFOCOM 2019, Paris, France, May 2019. PDF

Referred Models

Deep speech: https://github.com/SeanNaren/deepspeech.pytorch
PyTorch examples: https://github.com/pytorch/examples

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
audio_data		audio_data
exp_configs		exp_configs
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
batch_dist_mpi.sh		batch_dist_mpi.sh
batch_single.sh		batch_single.sh
cluster16		cluster16
cluster4		cluster4
compression.py		compression.py
dist_mpi.sh		dist_mpi.sh
dist_trainer.py		dist_trainer.py
distributed_optimizer.py		distributed_optimizer.py
dl_trainer.py		dl_trainer.py
evaluate.py		evaluate.py
labels.json		labels.json
model_builder.py		model_builder.py
profiling.py		profiling.py
ptb_reader.py		ptb_reader.py
readme-an4.txt		readme-an4.txt
requirements.txt		requirements.txt
resnet.py		resnet.py
settings.py		settings.py
single.sh		single.sh
utils.py		utils.py

Mrhs121/MG-WFBP

Folders and files

Latest commit

History

Repository files navigation

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Introduction

Installation

Prerequisites

Quick Start

Papers

Referred Models

About

Resources

Stars

Watchers

Forks

Languages