def Stabilizer(steepness=4, enable_self_stabilization=default_override_or(True), name=''): ''' Layer factory function to create a Droppo self-stabilizer <https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/SelfLR.pdf>. It multiplies its input with a scalar that is learned. This takes `enable_self_stabilization` as a flag that allows to disable itself. Useful if this is a global default. Note: Unlike the original paper, which proposed a linear or exponential scalar, CNTK uses a sharpened Softplus: 1/steepness ln(1+e^{steepness*beta}). The softplus behaves linear for weights around and above 1 (like the linear scalar) while guaranteeing positiveness (like the exponentional variant) but is also more robust by avoiding exploding gradients. ''' enable_self_stabilization = get_default_override(Stabilizer, enable_self_stabilization=enable_self_stabilization) if not enable_self_stabilization: # disabled (typically through global option; otherwise one would not call this in the first place) return identity # parameters bound to this Function init_param = np.log(np.exp(steepness) -1) / steepness # initialize so that factor is initially 1 (has no effect) param = Parameter((), init=init_param, name='alpha') beta = softplus(param, steepness=steepness) # expression @BlockFunction('Stabilizer', name) def stabilize(x): return beta * x return stabilize
def Stabilizer(steepness=4, enable_self_stabilization=default_override_or(True), name=''): ''' Stabilizer(steepness=4, enable_self_stabilization=True, name='') Layer factory function to create a `Droppo self-stabilizer <https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/SelfLR.pdf>`_. It multiplies its input with a scalar that is learned. This takes `enable_self_stabilization` as a flag that allows to disable itself. Useful if this is a global default. Note: Some other layers (specifically, recurrent units like :func:`~cntk.layers.blocks.LSTM`) also have the option to use the ``Stabilizer()`` layer internally. That is enabled by passing `enable_self_stabilization=True` to those layers. In conjunction with those, the rule is that an explicit ``Stabilizer()`` must be inserted by the user for the main data input, whereas the recurrent layer will own the stabilizer(s) for the internal recurrent connection(s). Note: Unlike the original paper, which proposed a linear or exponential scalar, CNTK uses a sharpened Softplus: 1/steepness ln(1+e^{steepness*beta}). The softplus behaves linear for weights around and above 1 (like the linear scalar) while guaranteeing positiveness (like the exponentional variant) but is also more robust by avoiding exploding gradients. Example: >>> # recurrent model with self-stabilization >>> from cntk.layers import * >>> with default_options(enable_self_stabilization=True): # enable stabilizers by default for LSTM() ... model = Sequential([ ... Embedding(300), ... Stabilizer(), # stabilizer for main data input of recurrence ... Recurrence(LSTM(512)), # LSTM owns its own stabilizers for the recurrent connections ... Stabilizer(), ... Dense(10) ... ]) Args: steepness (`int`, defaults to 4): enable_self_stabilization (bool, defaults to `False`): a flag that allows to disable itself. Useful if this is a global default name (str, defaults to ''): the name of the Function instance in the network Returns: cntk.ops.functions.Function: A function ''' enable_self_stabilization = get_default_override(Stabilizer, enable_self_stabilization=enable_self_stabilization) if not enable_self_stabilization: # disabled (typically through global option; otherwise one would not call this in the first place) return identity # parameters bound to this Function init_param = np.log(np.exp(steepness) -1) / steepness # initialize so that factor is initially 1 (has no effect) param = Parameter((), init=init_param, name='alpha') beta = softplus(param, steepness=steepness) # expression @BlockFunction('Stabilizer', name) def stabilize(x): return beta * x return stabilize
def Stabilizer(steepness=4, enable_self_stabilization=default_override_or(True), name=''): ''' Stabilizer(steepness=4, enable_self_stabilization=True, name='') Layer factory function to create a `Droppo self-stabilizer <https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/SelfLR.pdf>`_. It multiplies its input with a scalar that is learned. This takes `enable_self_stabilization` as a flag that allows to disable itself. Useful if this is a global default. Note: Unlike the original paper, which proposed a linear or exponential scalar, CNTK uses a sharpened Softplus: 1/steepness ln(1+e^{steepness*beta}). The softplus behaves linear for weights around and above 1 (like the linear scalar) while guaranteeing positiveness (like the exponentional variant) but is also more robust by avoiding exploding gradients. Args: steepness (`int`, defaults to 4): enable_self_stabilization (bool, defaults to `False`): a flag that allows to disable itself. Useful if this is a global default name (str, defaults to ''): the name of the Function instance in the network Returns: cntk.ops.functions.Function: A function ''' enable_self_stabilization = get_default_override(Stabilizer, enable_self_stabilization=enable_self_stabilization) if not enable_self_stabilization: # disabled (typically through global option; otherwise one would not call this in the first place) return identity # parameters bound to this Function init_param = np.log(np.exp(steepness) -1) / steepness # initialize so that factor is initially 1 (has no effect) param = Parameter((), init=init_param, name='alpha') beta = softplus(param, steepness=steepness) # expression @BlockFunction('Stabilizer', name) def stabilize(x): return beta * x return stabilize