Recurrent Batch Normalization

Chainer implementation of Batch-normalizing LSTM described in Recurrent Batch Normalization [arXiv:1603.09025].

Todo:

separate statistics

The batch normalization transform relies on batch statistics to standardize the LSTM activations. It 
would seem natural to share the statistics that are used for normalization across time, just as recurrent
neural networks share their parameters over time. However, we have found that simply averaging
statistics over time severely degrades performance. Although LSTM activations do converge to a
stationary distribution, we have empirically observed that their statistics during the initial transient
differ significantly as figure 1 shows. Consequently, we recommend using separate statistics for
each timestep to preserve information of the initial transient phase in the activations.

Requirements

Chainer 1.8+

Running

Before

from chainer import links as L
lstm = L.LSTM(n_in, n_out)

After

from bnlstm import BNLSTM
lstm = BNLSTM(n_in, n_out)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bnlstm.py		bnlstm.py
grad_check_bias.py		grad_check_bias.py
grad_check_bn.py		grad_check_bn.py
grad_check_lstm.py		grad_check_lstm.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bnlstm.py

bnlstm.py

grad_check_bias.py

grad_check_bias.py

grad_check_bn.py

grad_check_bn.py

grad_check_lstm.py

grad_check_lstm.py

readme.md

readme.md

Repository files navigation

Recurrent Batch Normalization

Todo:

Requirements

Running

About

Releases

Packages

Languages

musyoku/recurrent-batch-normalization

Folders and files

Latest commit

History

Repository files navigation

Recurrent Batch Normalization

Todo:

Requirements

Running

About

Resources

Stars

Watchers

Forks

Languages