def compute_dz2_dW2(a1, c): ''' Compute local gradient of the logits function z2 w.r.t. the weights W2. Input: a1: the activations of sigmoid function, a numpy float vector of shape h by 1. Output: dz2_dW2: the partial gradient of logits z2 w.r.t. the weight matrix W2, a numpy float matrix of shape (c by h). The (i,j)-th element represents the partial gradient of the i-th logit (z2[i]) w.r.t. the weight W2[i,j]: d_z2[i] / d_W2[i,j] ''' dz2_dW2 = sr.compute_dz_dW(a1, c) return dz2_dW2
def compute_dz1_dW1(x, h): ''' Compute local gradient of the logits function z1 w.r.t. the weights W1 in the 1st layer. Input: x: the feature vector of a data instance, a float numpy vector of shape p by 1. Here p is the number of features/dimensions. h: the number of output activations in the first layer, an integer. Output: dz1_dW1: the partial gradient of logits z1 w.r.t. the weight matrix W1, a numpy float matrix of shape (h by p). The (i,j)-th element represents the partial gradient of the i-th logit (z1[i]) w.r.t. the weight W1[i,j]: d_z1[i] / d_W1[i,j] ''' dz1_dW1 = sr.compute_dz_dW(x, h) return dz1_dW1