Skip to content

yuanzhiKe/YellowFin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YellowFin

YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measures the objective landscape on-the-fly and tune momentum as well as learning rate using local quadratic approximation.

The implmentation here can be a drop-in replacement for any optimizer in Tensorflow. It supports both minimize and apply_gradients like any tensorflow optimizer after from yellowfin import YFOptimizer.

For more technical details, please refer to our paper YellowFin and the Art of Momentum Tuning.

For more usage details, please refer to the inline documentation of tuner_utils/yellowfin.py. Example usage can be found here for CIFAR and PTB.

Setup instructions for experiments

Please clone the master branch and follow the instructions to run YellowFin on ResNet for CIFAR10, Bottleneck Resnet on CIRAR100 for image recognition, LSTM on Penn Treebank for language modeling, Char Rnn LSTM on TinyShakespeare and LSTM on Wall Street Journal dataset for constituency parsing. The CIFAR and PTB models we use are slightly adapted from official Tensorflow ResNet and LSTM. The Char Rnn LSTM and the Parsing LSTM are adapted from Char Rnn repo and Parsing LSTM repo respectively. Thanks to the researchers for developing the models.

Note YellowFin is tested under Tensorflow 1.1 and Python 2.7.

download data

Please use the data/download.sh script to download CIFAR10/100 and Penn Treebank dataset. It may take a few minutes depending on the network speed. Other datasets are self-included in the repo.

cd data
bash download.sh

Run CIFAR10/100 ResNets experiments

The experiments on 110 layer ResNet with CIFAR10 and 164 layer ResNet with CIFAR100 can be launched using

cd cifar/scripts
python CIFAR10-release.py (for CIFAR10)
python CIFAR100-release.py (for CIFAR10)

Run Penn Treebank LSTM experiments

The experiments on multiple-layer LSTM on Penn Treebank can be launched using

cd ptb/scripts
python PTB-release.py

Run Char Rnn LSTM experiments

The experiments on Char Rnn LSTM with TinyShakespeare dataset can be launched using

cd char-rnn-tensorflow
python train_YF.py --log_dir=path_to_log --data_dir=./data/tinyshakespeare/

Run constituency parsing LSTM experiments

The experiments on constituency parsing with the Wall Street Journal (WSJ) dataset can be launched using

cd parsing
mkdir -p models/wsj && python train.py --data_path=wsj --model_path=models/wsj/model --log_dir=path_to_log --opt_method="YF"

Note the WSJ is not public available. Please contact us or the author of Parsing LSTM repo for the access of the data. The data can be preprocessed following the instructions in Parsing LSTM repo. You should be able to run our scripts on the preprocessed data.

Detailed guidelines

a. YFOptimizer(lr=1.0, mu=0.0) sets initial learnig rate and momentum to 1.0 and 0.0 respectively. This is the uniform setting (i.e. without tuning) for all our PyTorch and Tensorflow experiments. Typically, after a few thousand minibatches, the influence of these initial values diminishes.

b. If you want to clip the gradient, you can also consider using the clip_thresh argument when initializing the YFOptimizer.

c. If you want to use the typical lr-dropping technique after a ceritain number of epochs, or you want to more finely control the learning rate, please use lr_factor in the YFOptimizer class. More details can be found here.

PyTorch implementation

YellowFin PyTorch repo

About

auto-tuning momentum SGD optimizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.4%
  • Shell 3.6%