예제 #1
0
def _variance(data, m, p):
    """Return an estimate of variance with N-p degrees of freedom."""
    n, ss = _std_moment(data, m, 1, 2)
    assert n >= 0
    if n <= p:
        raise StatsError(
        'at least %d items are required but only got %d' % (p+1, n))
    den = n - p
    v.assert_(lambda x: x >= 0.0, ss)
    return v.div(ss, den)
예제 #2
0
def kurtosis(data, m=None, s=None):
    """kurtosis(data [,m [,s]]) -> sample excess kurtosis of data.

    The kurtosis of a distribution is a measure of its shape. This function
    returns an estimate of the sample excess kurtosis usually known as g₂
    "g\\N{SUBSCRIPT TWO}". For the population kurtosis, see ``pkurtosis``.

        WARNING: The mathematical terminology and notation related to
        kurtosis is often inconsistent and contradictory. See Wolfram
        Mathworld for further details:

        http://mathworld.wolfram.com/Kurtosis.html

    >>> kurtosis([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5])
    ... #doctest: +ELLIPSIS
    3.03678892733564...

    If you already know one or both of the population mean and standard
    deviation, you can pass the mean as optional argument m and/or the
    standard deviation as s:

    >>> kurtosis([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5], m=2.25, s=1)
    2.3064453125

        CAUTION: "Garbage in, garbage out" applies here. You can pass
        any values you like as ``m`` or ``s``, but if they are not
        sensible estimates for the mean and standard deviation, the
        result returned as the kurtosis will likewise not be sensible.
        If you give either m or s, and the calculated kurtosis is out
        of range, a warning is raised.

    If m or s are not given, or are None, they are estimated from the data.

    If data is an iterable of sequences, each inner sequence represents a
    row of data, and ``kurtosis`` operates on each column. Every row must
    have the same number of columns, or ValueError is raised.

    >>> data = [[0, 1],
    ...         [1, 5],
    ...         [2, 6],
    ...         [5, 7]]
    ...
    >>> kurtosis(data)  #doctest: +ELLIPSIS
    [1.50000000000000..., 2.23486717956161...]

    Similarly, if either m or s are given, they must be either a single
    number or have the same number of items:

    >>> kurtosis(data, m=[3, 5], s=2)  #doctest: +ELLIPSIS
    [-0.140625, 18.4921875]

    The kurtosis of a population is a measure of the peakedness and weight
    of the tails. The normal distribution has kurtosis of zero; positive
    kurtosis generally has heavier tails and a sharper peak than normal;
    negative kurtosis generally has lighter tails and a flatter peak.

    There is no upper limit for kurtosis, and a lower limit of -2. Higher
    kurtosis means more of the variance is the result of infrequent extreme
    deviations, as opposed to frequent modestly sized deviations.

        CAUTION: As a rule of thumb, a non-zero value for kurtosis
        should only be treated as meaningful if its absolute value is
        larger than approximately twice its standard error. See also
        ``stderrkurtosis``.

    """
    n, total = stats._std_moment(data, m, s, 4)
    assert n >= 0
    v.assert_(lambda x: x >= 1, total)
    if n < 4:
        raise StatsError('sample kurtosis requires at least 4 data points')
    q = (n-1)/((n-2)*(n-3))
    gamma2 = v.div(total, n)
    # Don't do this:-
    # kurt = v.mul((n+1)*q, gamma2)
    # kurt = v.sub(kurt, 3*(n-1)*q)
    #   Even though the above two commented out lines are mathematically
    #   equivalent to the next two, and cheaper, they appear to be
    #   slightly less accurate.
    kurt = v.sub(v.mul(n+1, gamma2), 3*(n-1))
    kurt = v.mul(q, kurt)
    if v.isiterable(kurt): out_of_range = any(x < -2 for x in kurt)
    else: out_of_range = kurt < -2
    if m is s is None:
        assert not out_of_range, 'kurtosis failed: %r' % kurt
        # This is a "should never happen" condition, hence an assertion.
    else:
        # This, on the other hand, can easily happen if the caller
        # gives junk values for m or s. The difference between a junk
        # value and a legitimate value can be surprisingly subtle!
        if out_of_range:
            import warnings
            warnings.warn('calculated kurtosis out of range')
    return kurt