AdaS is an optimizer with adaptive scheduled learning rate methodology for training Convolutional Neural Networks (CNN).
- AdaS exhibits the rapid minimization characteristics that adaptive optimizers like AdaM are favoured for
- AdaS exhibits generalization (low testing loss) characteristics on par with SGD based optimizers, improving on the poor generalization characteristics of adaptive optimizers
- AdaS introduces no computational overhead over adaptive optimizers (see experimental results)
- In addition to optimization, AdaS introduces new quality metrics for CNN training (quality metrics)
This repository contains a PyTorch implementation of the AdaS learning rate scheduler algorithm.
AdaS is released under the MIT License (refer to the LICENSE file for more information)
Permissions | Conditions | Limitations |
---|---|---|
Commerical use | License and Copyright Notice | Liability |
Distribution | Warranty | |
Modification | ||
Private Use |
@misc{hosseini2020adas,
title={AdaS: Adaptive Scheduling of Stochastic Gradients},
author={Mahdi S. Hosseini and Konstantinos N. Plataniotis},
year={2020},
eprint={2006.06587},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Figure 1: Training performance using different optimizers across two datasets and two CNNs
Table 1: Image classification performance (test accuracy) with fixed budget epoch of ResNet34 training
Please refer to QC on Wiki for more information on two metrics of knowledge gain and mapping condition for monitoring training quality of CNNs
We use Python 3.7
.
Please refer to Requirements on Wiki for complete guideline.
AdaS introduces no overhead (very minimal) over adaptive optimizers e.g. all mSGD+StepLR, mSGD+AdaS, AdaM consume 40~43 sec/epoch to train ResNet34/CIFAR10 using the same PC/GPU platform
Optimizer | Learning Rate Scheduler | Epoch Time (avg.) | RAM (Memory) Consumed | GPU Memory Consumed |
---|---|---|---|---|
mSGD | StepLR | 40-43 seconds | ~2.75 GB | ~3.0 GB |
mSGD | AdaS | 40-43 seconds | ~2.75 GB | ~3.0 GB |
ADAM | None | 40-43 seconds | ~2.75 GB | ~3.0 GB |
There are two versions of the AdaS code contained in this repository.
- a python-package version of the AdaS code, which can be
pip
-installed. - a static python module (unpackaged), runable as a script.
All source code can be found in src/adas
For more information, also refer to Installation on Wiki
Moving forward, I will refer to console usage of this library. IDE usage is no different. Training options are split two ways:
- all environment/infrastructure options (GPU usage, output paths, etc.) is specified using arguments.
- training specific options (network, dataset, hyper-parameters, etc.) is specified using a configuration config.yaml file:
###### Application Specific ######
dataset: 'CIFAR10'
network: 'VGG16'
optimizer: 'SGD'
scheduler: 'AdaS'
###### Suggested Tune ######
init_lr: 0.03
early_stop_threshold: 0.001
optimizer_kwargs:
momentum: 0.9
weight_decay: 5e-4
scheduler_kwargs:
beta: 0.8
###### Suggested Default ######
n_trials: 5
max_epoch: 150
num_workers: 4
early_stop_patience: 10
mini_batch_size: 128
p: 1 # options: 1, 2.
loss: 'cross_entropy'
For complete instruction on configuration and different parameter setup, please refer to Configuration on Wiki
- None :)
- Add medical imaging datasets (e.g. digital pathology, xray, and ct scans)
- Extension of AdaS to Deep Neural Networks
Note the following:
- Our Pytests write/download data/files etc. to
/tmp
, so if you don't have a/tmp
folder (i.e. you're on Windows), then correct this if you wish to run the tests yourself