def binaryop(func, lar1, lar2, join='inner', cast=True, missone='ignore', misstwo='ignore', **kwargs): """ Binary operation on two larrys using given function and join method. Parameters ---------- func : function A function that takes two Numpy arrays as input and returns a Numpy array as output. For example: np.add. You can also pass keyword arguments to the function; see `**kwargs`. lar1 : larry The larry on the left-hand side of the binary operation. Must have the same number of dimensions as `lar2`. lar2 : larry The larry on the right-hand side of the binary operation. Must have the same number of dimensions as `lar1`. join : {'inner', 'outer', 'left', 'right', list}, optional The method used to join the two larrys. The default join method along all axes is 'inner', i.e., the intersection of the labels. If `join` is a list of strings then the length of the list should be the number of dimensions of the two larrys. The first element in the list is the join method for axis=0, the second element is the join method for axis=1, and so on. cast : bool, optional Only float, str, and object dtypes have missing value markers (la.nan, '', and None, respectively). Other dtypes, such as int and bool, do not have missing value markers. If `cast` is set to True (default) then int and bool dtypes, for example, will be cast to float if any new rows, columns, etc are created. If cast is set to False, then a TypeError will be raised for int and bool dtype input if the join introduces new rows, columns, etc. An inner join will never introduce new rows, columns, etc. missone : {scalar, 'ignore'}, optional By default ('ignore') no special treatment of missing values is made. If, however, `missone` is set to something other than 'ignore', such as 0, then all elements that are missing in one larry but not missing in the other larry are replaced by `missone`. For example, if an element is in one larry but missing in the other larry then you may want to set the missing value to zero when summing two larrys. misstwo : {scalar, 'ignore'}, optional By default ('ignore') no special treatment of missing values is made. If, however, `misstwo` is set to something other than 'ignore', such as 0, then all elements that are missing in both larrys are replaced by `misstwo`. **kwargs : Keyword arguments, optional Keyword arguments to pass to `func`. The keyword arguments passed to `func` cannot have the following keys: join, cast, missone, misstwo. Returns ------- lar3 : larry The result of the binary operation. See Also -------- la.align: Align two larrys using one of five join methods. Examples -------- Create two larrys: >>> from la import nan >>> lar1 = larry([1, 2, nan], [['a', 'b', 'c']]) >>> lar2 = larry([1, nan, nan], [['a', 'b', 'dd']]) The default is an inner join (note that lar1 and lar2 have two labels in common): >>> la.binaryop(np.add, lar1, lar2) label_0 a b x array([ 2., NaN]) If one data element is missing in one larry but not in the other, then you can replace the missing value with `missone` (here 0): >>> la.binaryop(np.add, lar1, lar2, missone=0) label_0 a b x array([ 2., 2.]) An outer join: >>> la.binaryop(np.add, lar1, lar2, join='outer') label_0 a b c dd x array([ 2., NaN, NaN, NaN]) An outer join with single and double missing values replaced by zero: >>> la.binaryop(np.add, lar1, lar2, join='outer', missone=0, misstwo=0) label_0 a b c dd x array([ 2., 2., 0., 0.]) """ # Align x1, x2, label, ign1, ign2 = align_raw(lar1, lar2, join=join, cast=cast) # Replacing missing values is slow, so only do if requested if missone != 'ignore' or misstwo != 'ignore': miss1 = ismissing(x1) miss2 = ismissing(x2) if missone != 'ignore': missone1 = miss1 & ~miss2 if missone1.any(): x1[missone1] = missone missone2 = miss2 & ~miss1 if missone2.any(): x2[missone2] = missone if misstwo != 'ignore': misstwo12 = miss1 & miss2 if misstwo12.any(): x1[misstwo12] = misstwo x2[misstwo12] = misstwo # Binary function x = func(x1, x2, **kwargs) return larry(x, label, integrity=False)
def rand(*args, **kwargs): """ Random samples from a uniform distribution in a given shape. The random samples are from a uniform distribution over ``[0, 1)``. Parameters ---------- args : `n` ints, optional The dimensions of the returned larry, should be all positive. These may be omitted if you pass in a label as a keyword argument. kwargs : keyword arguments, optional Keyword arguments to use in the construction of the larry such as label and integrity. If a label is passed then its dimensions must match the `n` integers passed in or, optionally, you can pass in the label without the `n` shape integers. If rand is passed in then that will be used to generate the random numbers. In that way you can set the state of the random number generator outside of this function. Returns ------- Z : larry or float A ``(d1, ..., dn)``-shaped larry of floating-point samples from a uniform distribution, or a single such float if no parameters were supplied. See Also -------- la.randn : Random samples from the "standard normal" distribution. Examples -------- A single random sample: >>> la.rand() 0.64323350463488804 A shape (2, 2) random larry: >>> la.rand(2, 2) label_0 0 1 label_1 0 1 x array([[ 0.09277439, 0.94194077], [ 0.72887997, 0.41124147]]) A shape (2, 2) random larry with given labels: >>> la.rand(label=[['row1', 'row2'], ['col1', 'col2']]) label_0 row1 row2 label_1 col1 col2 x array([[ 0.3449072 , 0.40397174], [ 0.7791279 , 0.86084403]]) Results are repeatable if you set the state of the random number generator outside of la.rand: >>> import numpy as np >>> rs = np.random.RandomState([1, 2, 3]) >>> la.randn(randn=rs.randn) 0.89858244820995015 >>> la.randn(randn=rs.randn) 0.25528876596298244 >>> rs = np.random.RandomState([1, 2, 3]) >>> la.randn(randn=rs.randn) 0.89858244820995015 >>> la.randn(randn=rs.randn) 0.25528876596298244 """ if 'rand' in kwargs: randfunc = kwargs['rand'] kwargs = dict(kwargs) del kwargs['rand'] else: randfunc = np.random.rand if len(args) > 0: return larry(randfunc(*args), **kwargs) elif 'label' in kwargs: n = [len(z) for z in kwargs['label']] return larry(randfunc(*n), **kwargs) elif (len(args) == 0) and (len(kwargs) == 0): return randfunc() elif (len(args) == 0) and (len(kwargs) == 1) and ('rand' in kwargs): return randfunc() else: raise ValueError, 'Input parameters not recognized'
def randn(*args, **kwargs): """ Random samples from the "standard normal" distribution in a given shape. The random samples are from a "normal" (Gaussian) distribution of mean 0 and variance 1. Parameters ---------- args : `n` ints, optional The dimensions of the returned larry, should be all positive. These may be omitted if you pass in a label as a keyword argument. kwargs : keyword arguments, optional Keyword arguments to use in the construction of the larry such as label and integrity. If a label is passed then its dimensions must match the `n` integers passed in or, optionally, you can pass in the label without the `n` shape integers. If randn is passed in then that will be used to generate the random numbers. In that way you can set the state of the random number generator outside of this function. Returns ------- Z : larry or float A ``(d1, ..., dn)``-shaped larry of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied. See Also -------- la.rand : Random values from a uniform distribution in a given shape. Examples -------- A single random sample: >>> la.randn() 0.33086946957034052 A shape (2, 2) random larry: >>> la.randn(2, 2) label_0 0 1 label_1 0 1 x array([[-0.08182341, 0.79768108], [-0.23584547, 1.80118376]]) A shape (2, 2) random larry with given labels: >>> la.randn(label=[['row1', 'row2'], ['col1', 'col2']]) label_0 row1 row2 label_1 col1 col2 x array([[ 0.10737701, -0.24947824], [ 1.51021208, 1.00280387]]) Results are repeatable if you set the state of the random number generator outside of la.rand: >>> import numpy as np >>> rs = np.random.RandomState([1, 2, 3]) >>> la.randn(randn=rs.randn) 0.89858244820995015 >>> la.randn(randn=rs.randn) 0.25528876596298244 >>> rs = np.random.RandomState([1, 2, 3]) >>> la.randn(randn=rs.randn) 0.89858244820995015 >>> la.randn(randn=rs.randn) 0.25528876596298244 """ if 'randn' in kwargs: randnfunc = kwargs['randn'] kwargs = dict(kwargs) del kwargs['randn'] else: randnfunc = np.random.randn if len(args) > 0: return larry(randnfunc(*args), **kwargs) elif 'label' in kwargs: n = [len(z) for z in kwargs['label']] return larry(randnfunc(*n), **kwargs) elif (len(args) == 0) and (len(kwargs) == 0): return randnfunc() elif (len(args) == 0) and (len(kwargs) == 1) and ('randn' in kwargs): return randnfunc() else: raise ValueError, 'Input parameters not recognized'
def stack(mode, **kwargs): """Stack 2d larrys to make a 3d larry. Parameters ---------- mode : {'union', 'intersection'} Should the 3d larry be made from the union or intersection of all the rows and all the columns? kwargs : name=larry Variable length input listing the z axis name and larry. For example, stack('union', distance=x, temperature=y, pressure=z) Returns ------- out : larry Returns a 3d larry. Raises ------ ValueError If mode is not union or intersection or if any of the input larrys are not 2d. Examples -------- >>> import la >>> y1 = la.larry([[1, 2], [3, 4]]) >>> y2 = la.larry([[5, 6], [7, 8]]) >>> la.stack('union', name1=y1, othername=y2) label_0 othername name1 label_1 0 1 label_2 0 1 x array([[[ 5., 6.], [ 7., 8.]], . [[ 1., 2.], [ 3., 4.]]]) """ if not np.all([kwargs[key].ndim == 2 for key in kwargs]): raise ValueError, 'All input larrys must be 2d' if mode == 'union': logic = union elif mode == 'intersection': logic = intersection else: raise ValueError, 'mode must be union or intersection' row = logic(0, *kwargs.values()) col = logic(1, *kwargs.values()) x = np.zeros((len(kwargs), len(row), len(col))) zlabel = [] for i, key in enumerate(kwargs): y = kwargs[key] y = y.morph(row, 0) y = y.morph(col, 1) x[i] = y.x zlabel.append(key) label = [zlabel, row, col] return larry(x, label)
def align(lar1, lar2, join='inner', cast=True): """ Align two larrys using one of five join methods. Parameters ---------- lar1 : larry One of the input larrys. Must have the same number of dimensions as `lar2`. lar2 : larry One of the input larrys. Must have the same number of dimensions as `lar1`. join : {'inner', 'outer', 'left', 'right', list}, optional The join method used to align the two larrys. The default join method along each axis is 'inner', i.e., the intersection of the labels. If `join` is a list of strings then the length of the list should be the same as the number of dimensions of the two larrys. The first element in the list is the join method for axis=0, the second element is the join method for axis=1, and so on. cast : bool, optional Only float, str, and object dtypes have missing value markers (la.nan, '', and None, respectively). Other dtypes, such as int and bool, do not have missing value markers. If `cast` is set to True (default) then int and bool dtypes, for example, will be cast to float if any new rows, columns, etc are created. If cast is set to False, then a TypeError will be raised for int and bool dtype input if the join introduces new rows, columns, etc. An inner join will never introduce new rows, columns, etc. Returns ------- lar3 : larry A copy of the aligned version of `lar1`. lar4 : larry A copy of the aligned version of `lar2`. Examples -------- Create two larrys: >>> lar1 = larry([1, 2]) >>> lar2 = larry([1, 2, 3]) The default join method is an inner join: >>> lar3, lar4 = la.align(lar1, lar2) >>> lar3 label_0 0 1 x array([1, 2]) >>> lar4 label_0 0 1 x array([1, 2]) An outer join adds a missing value (NaN) to lar1, therefore the the dtype of lar1 is changed from int to float: >>> lar3, lar4 = la.align(lar1, lar2, join='outer') >>> lar3 label_0 0 1 2 x array([ 1., 2., NaN]) >>> lar4 label_0 0 1 2 x array([1, 2, 3]) """ x1, x2, label, x1isview, x2isview = align_raw(lar1, lar2, join=join, cast=cast) if x1isview: x1 = x1.copy() lar3 = larry(x1, label, integrity=False) label = [list(lab) for lab in label] if x2isview: x2 = x2.copy() lar4 = larry(x2, label, integrity=False) return lar3, lar4