def testFnDefaultMeta(self): b = builder.Base.Params() b = b.Instantiate() def Foo(x, y): return x * x, y * 2 p = b._Fn('fn', Foo) meta = p.cls.FPropMeta(p, tshape.Shape([4, 6]), tshape.Shape([3, 3])) self.assertEqual(meta.flops, 33) self.assertEqual(meta.out_shapes[0].ToTensorShape().as_list(), [4, 6]) self.assertEqual(meta.out_shapes[1].ToTensorShape().as_list(), [3, 3]) g = tf.Graph() with g.as_default(): l = p.Instantiate() x = tf.random.normal(shape=[4, 8]) y = tf.random.normal(shape=[3, 3]) z0, z1 = l.FPropDefaultTheta(x, y) with self.session(graph=g) as sess: sess.run(tf.global_variables_initializer()) vx, vy, vz0, vz1 = sess.run([x, y, z0, z1]) self.assertAllClose(vx * vx, vz0) self.assertAllClose(vy * 2, vz1)
def FPropMeta(cls, p, *args): py_utils.CheckShapes(args) input_shapes = [ None if arg is None else tshape.Shape(arg.get_shape().as_list()[1:]) for arg in args ] meta = p.body.cls.FPropMeta(p.body, *input_shapes) py_utils.CheckShapes(meta.out_shapes) total = meta.flops * p.repeat out_shapes = [ None if s is None else tshape.Shape([p.repeat] + s[:]) for s in meta.out_shapes ] return py_utils.NestedMap(flops=total, out_shapes=tuple(out_shapes))
def _common_gpipe_transformer_fprop_meta(p, inputs, *args): """GPipe FPropMeta function.""" # TODO(huangyp): return accurate estimate of flops. py_utils.CheckShapes((inputs, )) flops_per_element = 5 src_time, source_batch, dim = inputs flops = flops_per_element * src_time * src_time * source_batch * dim args = args if isinstance(args, tuple) else (args, ) if not p.has_aux_atten and p.is_transparent: # Transparent Encoder FPropMeta if p.transparent_merger_tpl is not None: args = args[:5] + ( inputs, tshape.Shape([p.transparent_merger_tpl.num_sources])) args = args[:6] + (tshape.Shape([args[6][0] - 1]), ) if p.final_enc_layer: args = args[:5] + (None, None) return py_utils.NestedMap(flops=flops, out_shapes=(inputs, ) + args)
def _InferOutShapes(self, args): input_shapes = [ None if arg is None else tshape.Shape(arg.get_shape().as_list()[1:]) for arg in args ] out_shapes = self.body.FPropMeta(self.body.params, *input_shapes).out_shapes return [None if s is None else s.ToTensorShape() for s in out_shapes]
def FPropMeta(cls, p, inputs, *args): # TODO(ankurbpn): return accurate estimate of flops. py_utils.CheckShapes((inputs,)) flops_per_element = 2 # Is this correct? vocab = p.token_emb.vocab_size dim = p.token_emb.embedding_dim src_time, source_batch = inputs flops = flops_per_element * src_time * source_batch * dim * vocab args = args if isinstance(args, tuple) else (args,) new_inputs = tshape.Shape([src_time, source_batch, dim]) new_args = list(args) if p.add_tgt_embedding_layer: tgt_time, tgt_batch = args[1] new_args[1] = tshape.Shape([tgt_time, tgt_batch, dim]) new_args = tuple(new_args[:7]) return py_utils.NestedMap(flops=flops, out_shapes=(new_inputs,) + new_args)
def FPropMeta(cls, p, inputs, paddings): py_utils.CheckShapes((inputs, paddings)) b, t, f, ic = inputs assert f == 1 oc = p.filter_shape[2] * p.filter_shape[3] * p.weight_tiling_factor outputs = tshape.Shape([b, t, f, oc]) flops = b * t * f * p.filter_shape[0] * ic * oc * 5 return py_utils.NestedMap(flops=flops, out_shapes=(outputs, paddings))
def testNormalizedDepthwiseConv2DLayerFPropMeta(self): params = (conv_layers.NormalizedDepthwiseConv2DLayer.Params()) params.name = 'conv' params.filter_shape = [3, 1, 2, 1] params.weight_tiling_factor = 2 batch, time, frequency, in_channel = 2, 4, 1, 4 output_channels = 4 inputs_shape = tshape.Shape([batch, time, frequency, in_channel]) paddings_shape = tshape.Shape([batch, time]) with self.session(): out = params.cls.FPropMeta(params, inputs_shape, paddings_shape) expected_flops = batch * time * frequency * params.filter_shape[ 0] * output_channels * 5 self.assertEqual(expected_flops, out.flops) out_shapes = out.out_shapes self.assertEqual(out_shapes[0].ToTensorShape().as_list(), [batch, time, frequency, output_channels]) self.assertEqual(out_shapes[1].ToTensorShape().as_list(), [batch, time])
def FPropMeta(cls, p, inputs, *args): # TODO(ankurbpn): return accurate estimate of flops. py_utils.CheckShapes((inputs, )) flops_per_element = 2 # Is this correct? vocab = p.token_emb.vocab_size dim = p.token_emb.embedding_dim src_dim_0, src_dim_1 = inputs flops = flops_per_element * src_dim_0 * src_dim_1 * dim * vocab args = args if isinstance(args, tuple) else (args, ) new_inputs = tshape.Shape([src_dim_0, src_dim_1, dim]) new_args = list(args) if p.add_tgt_embedding_layer: tgt_dim_0, tgt_dim_1 = args[1] new_args[1] = tshape.Shape([tgt_dim_0, tgt_dim_1, dim]) if p.ret_task_ids: new_args = new_args[:5] + [None, None] + new_args[7:] else: new_args = new_args[:5] + [None, None] new_args = tuple(new_args) return py_utils.NestedMap(flops=flops, out_shapes=(new_inputs, ) + new_args)
def testEmptySequentialLayerFPropMeta(self): g = tf.Graph() with g.as_default(): p = layers.SequentialLayer.Params().Set(name='seq') l = p.Instantiate() x = py_utils.NestedMap(val=tf.random.normal(shape=[2, 32])) y = l.FPropDefaultTheta(x) self.assertIsInstance(y.val, tf.Tensor) y_shape = l.FPropMeta( p, py_utils.Transform(lambda t: tshape.Shape(t.shape), x)).out_shapes[0] self.assertEqual(y.val.shape.as_list(), y_shape.val.ToTensorShape().as_list())
def _Expect(self, expected_cost, p, *inputs): meta = p.cls.FPropMeta(p, *(tshape.Shape(s) for s in inputs)) self.assertEqual(meta.flops, expected_cost) g = tf.Graph() with g.as_default(): l = p.Instantiate() xs = [tf.random.normal(shape=s) for s in inputs] ys = l.FPropDefaultTheta(*xs) with self.session(graph=g) as sess: sess.run(tf.global_variables_initializer()) _ = sess.run(ys)
def testFn(self): b = builder.Base.Params() b = b.Instantiate() p = b._Fn('fn', lambda x, y: x + y, fn_out=lambda x, y: x) meta = p.cls.FPropMeta(p, tshape.Shape([4, 6]), tshape.Shape([4, 6])) self.assertEqual(meta.flops, 48) self.assertEqual(meta.out_shapes[0].ToTensorShape().as_list(), [4, 6]) g = tf.Graph() with g.as_default(): l = p.Instantiate() x = tf.random.normal(shape=[4, 8]) y = tf.random.normal(shape=[4, 1]) z = l.FPropDefaultTheta(x, y) with self.session(graph=g) as sess: sess.run(tf.global_variables_initializer()) v = sess.run([x, y, z]) self.assertAllClose(v[0] + v[1], v[2])
def testSymbolicDims(self): p = builder.Base.Params() b = p.Instantiate() f1 = tshape.Shape(['kh', 'kw', 'idims', 'odims']) kh, kw, idims, odims = f1 f2 = tshape.Shape([kh, kw, odims, odims]) p = b._Seq('test', b._Conv2D('conv', f1, (2, 2)), b._Conv2D('conv', f2, (2, 2)), b._Bias('bias', odims)) inp = tshape.Shape(['b', 'h', 'w', idims]) b, h, w, _ = inp meta = p.cls.FPropMeta(p, inp) print('flops = ', meta.flops) out = meta.out_shapes[0] print('outputs = ', out) # sympy.lambdify can help us to do faster numerical evaluation. # Might be useful to build a "cost" model given a builder layer. f = sympy.lambdify([b, h, w, kh, kw, idims, odims], meta.flops, 'numpy') print('f.source = ', inspect.getsource(f)) self.assertEqual(f(8, 224, 224, 3, 3, 8, 32), 925646848) self.assertEqual(f(8, 224, 224, 5, 5, 8, 32), 2569814016)
def _Glu(self, name, glu_with_tanh): def _GLUFn(inputs): gated_inputs, act_inputs = tf.split(inputs, 2, axis=-1) return act_inputs * tf.sigmoid(gated_inputs) def _GatedTanhFn(inputs): gated_inputs, act_inputs = tf.split(inputs, 2, axis=-1) return tf.tanh(act_inputs) * tf.sigmoid(gated_inputs) fn = _GatedTanhFn if glu_with_tanh else _GLUFn return self._Fn(name, fn=fn, fn_out=lambda x: tshape.Shape(x[:-1] + [x[-1] / 2]), fn_flops=lambda x: 15 * x.size)
def _verify_timestep_counts(self, num_splits, auto_partition=False, micro_batch_size=None): num_micro_batches = 8 batch_size = 16 with self.session(graph=tf.Graph()) as sess: tf.random.set_seed(1245) inputs = tf.random.uniform([batch_size, 8, 8, 1], seed=12345) if auto_partition: layers = [ _SimpyLayer.Params().Set(name='layer_{}'.format(i)) for i in range(16) ] net = PipeliningLayer.Params().Set( name='pipeline', num_micro_batches=num_micro_batches, cell_tpl=_Partition(layers, num_splits, tshape.Shape([batch_size, 8, 8, 1]))).Instantiate() else: net = _BuildDummyPipelineCnn( num_splits=num_splits, micro_batch_size=micro_batch_size, num_micro_batches=num_micro_batches) endpoints = net.FPropDefaultTheta(inputs) if isinstance(endpoints, (list, tuple)): logits, aux_logits = endpoints else: logits = endpoints aux_logits = None loss = tf.reduce_mean(logits) grads = tf.gradients(loss, tf.trainable_variables()) grad_norm = tf.sqrt(py_utils.SumSquared(grads)) ts = net.GetAccumulatorValues().Flatten() sess.run(tf.global_variables_initializer()) grad_norm_val, ts_vals = sess.run([grad_norm, ts]) test_utils.CompareToGoldenSingleFloat(self, 0.268087, grad_norm_val) # Accumulator values should be equal to number of time steps in pipeline. for ts_val in list(ts_vals): expected_ts = num_micro_batches if num_splits > 1 else 1 self.assertEqual(ts_val, expected_ts) if aux_logits is not None: aux_logit_tensor = sess.run(aux_logits) self.assertEqual(aux_logit_tensor.shape, (batch_size, 8, 8, 1))
def testGraphLayer(self): g = tf.Graph() with g.as_default(), self.SetEval(True): tf.random.set_seed(24332) def _FnMeta(*shapes): return py_utils.NestedMap(flops=1, out_shapes=shapes) p = layers.GraphLayer.Params().Set( name='graph', input_endpoints=['x'], output_endpoints=['y'], sub=[ ('x.a->y.c', layers.FnLayer.Params().Set(fn=lambda x: 2 * x, fn_meta=_FnMeta)), ('x.b->y.d', layers.FnLayer.Params().Set(name='bar', fn=lambda x: x + 2, fn_meta=_FnMeta)), ('y.c,y.d->y.e, y.f', layers.FnLayer.Params().Set(name='baz', fn=lambda x, y: (x + y, x - y), fn_meta=_FnMeta)), ]) l = p.Instantiate() x = py_utils.NestedMap(a=tf.constant(1.0), b=tf.constant(2.0)) y = l.FProp(l.theta, x) y_shape = l.FPropMeta( p, py_utils.Transform(lambda t: tshape.Shape(t.shape), x)).out_shapes[0] self.assertDictEqual( py_utils.Transform(lambda t: t.shape.as_list(), y), py_utils.Transform(lambda t: t.ToTensorShape().as_list(), y_shape)) with self.session(graph=g) as sess: sess.run(tf.global_variables_initializer()) y_val = sess.run(y) print(y_val) self.assertEqual(py_utils.NestedMap(c=2.0, d=4.0, e=6.0, f=-2.0), y_val)
def _CalculateOutputShapes(self, input_shapes): """Calcuate the output shape of intermediate layers. Given the FPropMeta function in each FeatureExtractionLayer, calcuates the shapes of outputs of that layer. This is used to recover the shape information in StackedRecurrent. Args: input_shapes: tuple of input TensorShapes. Returns: Return a list of K + 1 lists of shapes where K is the number of partitions. """ # Converts TensorShape to tshape.Shape. inputs = [] for x in input_shapes: if x is None: inputs.append(None) else: inputs.append(tshape.Shape(x.as_list())) del input_shapes state_shapes = [] def RecordInputShapes(tshapes): shapes = [] for s in tshapes: shapes.append( None if s is None else s.ToTensorShape().as_list()) state_shapes.append(shapes) for (_, before_layer) in self._before_layers: inputs = before_layer.FPropMeta(before_layer.params, *inputs).out_shapes RecordInputShapes(inputs) for (_, cell) in self._cells: inputs = cell.FPropMeta(cell.params, *inputs).out_shapes RecordInputShapes(inputs) return state_shapes
def _Squeeze(self, name): return self._Fn( name, fn=lambda x: tf.squeeze(x, 2), fn_out=lambda x: tshape.Shape(x[0:2] + x[3:]), fn_flops=lambda x: 1)
def _ExpandDims(self, name): return self._Fn( name, fn=lambda x: tf.expand_dims(x, 2), fn_out=lambda x: tshape.Shape(x[0:2] + [1] + x[2:]), fn_flops=lambda x: 1)
def _ToTShape(x): if x is None: return None return tshape.Shape(x.as_list())
def FPropMeta(cls, p, inputs, *args): dim1, dim2 = args[1][:2] if p.inputs_from_decoder else inputs[:2] logits = tshape.Shape([dim1, dim2, p.num_classes]) return py_utils.NestedMap(flops=100, out_shapes=(logits, ))
def FPropMeta(cls, p, inputs, *args): t, b = args[1][:2] if p.inputs_from_decoder else inputs[:2] per_example_xent = tshape.Shape([t, b]) logits = tshape.Shape([t, b, p.softmax.num_classes]) return py_utils.NestedMap(flops=100, out_shapes=(per_example_xent, logits))