efficient_densenet_pytorch

A PyTorch implementation of DenseNets, optimized to save GPU memory.

Motivation

While DenseNets are fairly easy to implement in deep learning frameworks, most implmementations (such as the original) tend to be memory-hungry. In particular, the number of intermediate feature maps generated by batch normalization and concatenation operations grows quadratically with network depth. It is worth emphasizing that this is not a property inherent to DenseNets, but rather to the implementation.

This implementation uses a new strategy to reduce the memory consumption of DenseNets. We assign all intermediate feature maps to two shared memory allocations, which are utilized by every Batch Norm and concatenation operation. Because the data in these allocations are temporary, we re-populate the outputs during back-propagation. This adds 15-20% of time overhead for training, but reduces feature map consumption from quadratic to linear.

For more details, please see the technical report.

Usage

In your existing project: There are two files in the models folder.

models/densenet.py is a "naive" implementation, based off the torchvision and project killer implementations.
models/densenet_efficient.py is the new efficient implementation. (Code is still a little ugly. We're working on cleaning it up!) Copy either one of those files into your project!
models/densenet_efficient.py is the new efficient implementation with multi-GPU support. They work as stand-alone files.

Running the demo:

single GPU:

CUDA_VISIBLE_DEVICES=0 python2 demo.py --efficient True --data <path_to_data_dir> --save <path_to_save_dir>

multi GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 python2 demo.py --multi-gpu True --data <path_to_data_dir> --save <path_to_save_dir>

Options:

--depth (int) - depth of the network (number of convolution layers) (default 40)
--growth_rate (int) - number of features added per DenseNet layer (default 12)
--n_epochs (int) - number of epochs for training (default 300)
--batch_size (int) - size of minibatch (default 256)
--seed (int) - manually set the random seed (default None)

Performance

A comparison of the two implementations (each is a DenseNet-BC with 100 layers, batch size 64, tested on a NVIDIA Pascal Titan-X):

Implementation	Memory cosumption (GB/GPU)	Speed (sec/mini batch)
Naive	2.863	0.165
Efficient	1.605	0.207
Efficient (multi-GPU)	0.985	-

Other efficient implementations

LuaTorch (by Gao Huang)
MxNet (by Danlu Chen)
Caffe (by Tongcheng Li)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
images		images
models		models
test		test
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
pytest.ini		pytest.ini
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

models

models

test

test

.gitignore

.gitignore

README.md

README.md

demo.py

demo.py

pytest.ini

pytest.ini

setup.cfg

setup.cfg

Repository files navigation

efficient_densenet_pytorch

Motivation

Usage

Performance

Other efficient implementations

About

Releases

Packages

Languages

Robert0812/efficient_densenet_pytorch

Folders and files

Latest commit

History

Repository files navigation

efficient_densenet_pytorch

Motivation

Usage

Performance

Other efficient implementations

About

Resources

Stars

Watchers

Forks

Languages