One can assign different precision for different layers (as the same idea from this paper) to experiment the best settings for the trade-off between quantization bits and accuracy.
A small sample of experiment results can be seen from here
- CUDA
pip packages:
- torch
- torchvision
- numpy
- pandas
- opencv-python
Use uniform precision as the same as original repo: (runs only one test)
python quantize.py --type alexnet --quant_method log --param_bits 32 --fwd_bits 32 --bn_bits 32 --gpu 0
Define precision of each layer in quantize_runner.py param_bits
(--param_bits), batch_norm_bits
(--bn_bits), layer_output_bits
(--fwd_bits) and other settings, and then run:
(currently only support uniform precision for layer_output_bits
)
python quantize_runner.py
Results will be written to both result.csv and result.pkl.
- Define your model class in /model.py or change predefined arguments in /train.py
- Run
python <dataset>/train.py
- Copy trained model (default is <project_root>/log/default//latest.pth) to model_root/.pth
- Set custom model kwargs in quantize_runner.py
- Run
python quantize_runner.py --model_root model_root
to evaluate self-trained model.
NOTE: For detailed project structure explanation please refer to README_orig.md
This code is modified from this repository, which adopts uniform precision.