def _variance(data, m, p): """Return an estimate of variance with N-p degrees of freedom.""" n, ss = _std_moment(data, m, 1, 2) assert n >= 0 if n <= p: raise StatsError( 'at least %d items are required but only got %d' % (p+1, n)) den = n - p v.assert_(lambda x: x >= 0.0, ss) return v.div(ss, den)
def kurtosis(data, m=None, s=None): """kurtosis(data [,m [,s]]) -> sample excess kurtosis of data. The kurtosis of a distribution is a measure of its shape. This function returns an estimate of the sample excess kurtosis usually known as g₂ "g\\N{SUBSCRIPT TWO}". For the population kurtosis, see ``pkurtosis``. WARNING: The mathematical terminology and notation related to kurtosis is often inconsistent and contradictory. See Wolfram Mathworld for further details: http://mathworld.wolfram.com/Kurtosis.html >>> kurtosis([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5]) ... #doctest: +ELLIPSIS 3.03678892733564... If you already know one or both of the population mean and standard deviation, you can pass the mean as optional argument m and/or the standard deviation as s: >>> kurtosis([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5], m=2.25, s=1) 2.3064453125 CAUTION: "Garbage in, garbage out" applies here. You can pass any values you like as ``m`` or ``s``, but if they are not sensible estimates for the mean and standard deviation, the result returned as the kurtosis will likewise not be sensible. If you give either m or s, and the calculated kurtosis is out of range, a warning is raised. If m or s are not given, or are None, they are estimated from the data. If data is an iterable of sequences, each inner sequence represents a row of data, and ``kurtosis`` operates on each column. Every row must have the same number of columns, or ValueError is raised. >>> data = [[0, 1], ... [1, 5], ... [2, 6], ... [5, 7]] ... >>> kurtosis(data) #doctest: +ELLIPSIS [1.50000000000000..., 2.23486717956161...] Similarly, if either m or s are given, they must be either a single number or have the same number of items: >>> kurtosis(data, m=[3, 5], s=2) #doctest: +ELLIPSIS [-0.140625, 18.4921875] The kurtosis of a population is a measure of the peakedness and weight of the tails. The normal distribution has kurtosis of zero; positive kurtosis generally has heavier tails and a sharper peak than normal; negative kurtosis generally has lighter tails and a flatter peak. There is no upper limit for kurtosis, and a lower limit of -2. Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. CAUTION: As a rule of thumb, a non-zero value for kurtosis should only be treated as meaningful if its absolute value is larger than approximately twice its standard error. See also ``stderrkurtosis``. """ n, total = stats._std_moment(data, m, s, 4) assert n >= 0 v.assert_(lambda x: x >= 1, total) if n < 4: raise StatsError('sample kurtosis requires at least 4 data points') q = (n-1)/((n-2)*(n-3)) gamma2 = v.div(total, n) # Don't do this:- # kurt = v.mul((n+1)*q, gamma2) # kurt = v.sub(kurt, 3*(n-1)*q) # Even though the above two commented out lines are mathematically # equivalent to the next two, and cheaper, they appear to be # slightly less accurate. kurt = v.sub(v.mul(n+1, gamma2), 3*(n-1)) kurt = v.mul(q, kurt) if v.isiterable(kurt): out_of_range = any(x < -2 for x in kurt) else: out_of_range = kurt < -2 if m is s is None: assert not out_of_range, 'kurtosis failed: %r' % kurt # This is a "should never happen" condition, hence an assertion. else: # This, on the other hand, can easily happen if the caller # gives junk values for m or s. The difference between a junk # value and a legitimate value can be surprisingly subtle! if out_of_range: import warnings warnings.warn('calculated kurtosis out of range') return kurt