Skip to content
This repository has been archived by the owner on Apr 16, 2021. It is now read-only.

Implementation of "Iterative pruning" on TensorFlow

Notifications You must be signed in to change notification settings

younghwanoh/impl-pruning-TF

Repository files navigation

TensorFlow implementation of "Iterative Pruning"

CAUTION: Out-of-date notices.

Currently, I've checked TF (>1.3) supports sparse_matmul and it seems that this is more correct way to implement iterative pruning. This work is just naively done with quite old versions (0.8.0) and thus, I do not recommend to consider these codes for your serious cases. And there will be no updates or maintenance either.


This work is based on "Learning both Weights and Connections for Efficient Neural Network." Song et al. @ NIPS '15. Note that these works are just for quantifying its effectiveness on latency (within TensorFlow), not a best optimal. Thus, some details are abbreviated for simplicity. (e.g. # of iterations, adjusted dropout ratio, etc.)

I applied Iterative Pruning on a small MNIST CNN model (13MB, originally), which can be accessed from TensorFlow Tutorials. After pruning off some percentages of weights, I've simply retrained two epochs for each case and got compressed models (minimum 2.6MB with 90% off) with minor loss of accuracy. (99.17% -> 98.99% with 90% off and retraining) Again, this is not an optimal.

Issues

Due to lack of supports on SparseTensor and its operations of TensorFlow (0.8.0), this implementation has some limitations. This work uses embedding_lookup_sparse to compute sparse matrix-vector multiplication. It is not solely for the purpose of sparse matrix vector multiplication, and thus its performance may be sub-optimal. (I'm not sure.) Also, TensorFlow uses <index, value> pair for sparse matrix rather than using typical CSR format which is more compact and performant. In summary, because of the following reasons, I think this implementation has some limitations.

  1. embedding_lookup_sparse doesn't support broadcasting, which prohibits users to run test with normal test datasets.
  2. Performance may be somewhat sub-optimal.
  3. Because "Sparse Variable" is not supported, manual dense to sparse and sparse to dense transformation is required.
  4. 4D Convolution Tensor may also be applicable, but bit tricky.
  5. Current embedding_lookup_sparse forces additional matrix transpose, dimension squeeze and dimension reshape.

File descriptions and usages

model_ckpt_dense: original model
model_ckpt_dense_pruned: 90% pruned-only model
model_ckpt_sparse_retrained: 90% pruned and retrained model

Python package requirements

sudo apt-get install python-scipy python-numpy python-matplotlib

To regenerate these sparse model, edit config.py first as your threshold configuration, and then run training with second (pruning and retraining) and third (generate sparse form of weight data) round options.

./train.py -2 -3

To inference single image (seven.png) and measure its latency,

./deploy_test.py -d -m model_ckpt_dense
./deploy_test_sparse.py -d -m model_ckpt_sparse_retrained

To test dense model,

./deploy_test.py -t -m model_ckpt_dense
./deploy_test.py -t -m model_ckpt_dense_pruned
./deploy_test.py -t -m model_ckpt_dense_retrained

To draw histogram that shows the weight distribution,

# After running train.py (it generates .dat files)
./draw_histogram.py

Performance

Results are currently somewhat mediocre or degraded due to indirection and additional storage overhead originated from sparse matrix form. Also, it may because model size is too small. (12.49MB)

Storage overhead

Baseline: 12.49 MB
10 % pruned: 21.86 MB
20 % pruned: 19.45 MB
30 % pruned: 17.05 MB
40 % pruned: 14.64 MB
50 % pruned: 12.23 MB
60 % pruned: 9.83 MB
70 % pruned: 7.42 MB
80 % pruned: 5.02 MB
90 % pruned: 2.61 MB

CPU performance (5 times averaged)

CPU: Intel Core i5-2500 @ 3.3 GHz, LLC size: 6 MB

http://younghwanoh.github.io/images/cpu-desktop.png

Baseline: 0.01118040085 s
10 % pruned: 1.919299984 s
20 % pruned: 0.2325239658 s
30 % pruned: 0.2111079693 s
40 % pruned: 0.1982570648 s
50 % pruned: 0.1691776752 s
60 % pruned: 0.1305227757 s
70 % pruned: 0.116039753 s
80 % pruned: 0.103564167 s
90 % pruned: 0.1058168888 s

GPU performance (5 times averaged)

GPU: Nvidia Geforce GTX650 @ 1.058 GHz, LLC size: 256 KB

http://younghwanoh.github.io/images/gpu-desktop.png

Baseline: 0.1475181845 s
10 % pruned: 0.2954540253 s
20 % pruned: 0.2665398121 s
30 % pruned: 0.2585638046 s
40 % pruned: 0.2090051651 s
50 % pruned: 0.1995279789 s
60 % pruned: 0.1815193653 s
70 % pruned: 0.1436806202 s
80 % pruned: 0.135668993 s
90 % pruned: 0.1218701839 s

About

Implementation of "Iterative pruning" on TensorFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages