Example #1
0
def kurtosis(data, m=None, s=None):
    """kurtosis(data [,m [,s]]) -> sample excess kurtosis of data.

    The kurtosis of a distribution is a measure of its shape. This function
    returns an estimate of the sample excess kurtosis usually known as g₂
    "g\\N{SUBSCRIPT TWO}". For the population kurtosis, see ``pkurtosis``.

        WARNING: The mathematical terminology and notation related to
        kurtosis is often inconsistent and contradictory. See Wolfram
        Mathworld for further details:

        http://mathworld.wolfram.com/Kurtosis.html

    >>> kurtosis([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5])
    ... #doctest: +ELLIPSIS
    3.03678892733564...

    If you already know one or both of the population mean and standard
    deviation, you can pass the mean as optional argument m and/or the
    standard deviation as s:

    >>> kurtosis([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5], m=2.25, s=1)
    2.3064453125

        CAUTION: "Garbage in, garbage out" applies here. You can pass
        any values you like as ``m`` or ``s``, but if they are not
        sensible estimates for the mean and standard deviation, the
        result returned as the kurtosis will likewise not be sensible.
        If you give either m or s, and the calculated kurtosis is out
        of range, a warning is raised.

    If m or s are not given, or are None, they are estimated from the data.

    If data is an iterable of sequences, each inner sequence represents a
    row of data, and ``kurtosis`` operates on each column. Every row must
    have the same number of columns, or ValueError is raised.

    >>> data = [[0, 1],
    ...         [1, 5],
    ...         [2, 6],
    ...         [5, 7]]
    ...
    >>> kurtosis(data)  #doctest: +ELLIPSIS
    [1.50000000000000..., 2.23486717956161...]

    Similarly, if either m or s are given, they must be either a single
    number or have the same number of items:

    >>> kurtosis(data, m=[3, 5], s=2)  #doctest: +ELLIPSIS
    [-0.140625, 18.4921875]

    The kurtosis of a population is a measure of the peakedness and weight
    of the tails. The normal distribution has kurtosis of zero; positive
    kurtosis generally has heavier tails and a sharper peak than normal;
    negative kurtosis generally has lighter tails and a flatter peak.

    There is no upper limit for kurtosis, and a lower limit of -2. Higher
    kurtosis means more of the variance is the result of infrequent extreme
    deviations, as opposed to frequent modestly sized deviations.

        CAUTION: As a rule of thumb, a non-zero value for kurtosis
        should only be treated as meaningful if its absolute value is
        larger than approximately twice its standard error. See also
        ``stderrkurtosis``.

    """
    n, total = stats._std_moment(data, m, s, 4)
    assert n >= 0
    v.assert_(lambda x: x >= 1, total)
    if n < 4:
        raise StatsError('sample kurtosis requires at least 4 data points')
    q = (n-1)/((n-2)*(n-3))
    gamma2 = v.div(total, n)
    # Don't do this:-
    # kurt = v.mul((n+1)*q, gamma2)
    # kurt = v.sub(kurt, 3*(n-1)*q)
    #   Even though the above two commented out lines are mathematically
    #   equivalent to the next two, and cheaper, they appear to be
    #   slightly less accurate.
    kurt = v.sub(v.mul(n+1, gamma2), 3*(n-1))
    kurt = v.mul(q, kurt)
    if v.isiterable(kurt): out_of_range = any(x < -2 for x in kurt)
    else: out_of_range = kurt < -2
    if m is s is None:
        assert not out_of_range, 'kurtosis failed: %r' % kurt
        # This is a "should never happen" condition, hence an assertion.
    else:
        # This, on the other hand, can easily happen if the caller
        # gives junk values for m or s. The difference between a junk
        # value and a legitimate value can be surprisingly subtle!
        if out_of_range:
            import warnings
            warnings.warn('calculated kurtosis out of range')
    return kurt
Example #2
0
def skewness(data, m=None, s=None):
    """skewness(data [,m [,s]]) -> sample skewness of data.

    The skewness, or third standardised moment, of data is the degree to
    which it is skewed to the left or right of the mean.

    This returns g₁ "g\\N{SUBSCRIPT ONE}", the sample skewness. For the
    population skewness, see function ``pskewness``.

        WARNING: The mathematical terminology and notation related to
        skewness is often inconsistent and contradictory. See Wolfram
        Mathworld for further details:

        http://mathworld.wolfram.com/Skewness.html

    >>> skewness([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5])
    ... #doctest: +ELLIPSIS
    1.71461013539878...

    If you already know one or both of the population mean and standard
    deviation, you can pass the mean as optional argument m and/or the
    standard deviation as s:

    >>> skewness([1.25, 1.5, 1.5, 1.75, 1.75, 2.5, 2.75, 4.5], m=2.25, s=1)
    ... #doctest: +ELLIPSIS
    1.47132881615329...

        CAUTION: "Garbage in, garbage out" applies here. You can pass
        any values you like as ``m`` or ``s``, but if they are not
        sensible estimates for the mean and standard deviation, the
        result returned as the skewness will likewise not be sensible.

    If m or s are not given, or are None, they are estimated from the data.

    If data is an iterable of sequences, each inner sequence represents a
    row of data, and ``skewness`` operates on each column. Every row must
    have the same number of columns, or ValueError is raised.

    >>> data = [[0, 1],
    ...         [1, 5],
    ...         [2, 6],
    ...         [5, 7]]
    ...
    >>> skewness(data)  #doctest: +ELLIPSIS
    [1.19034012827899..., -1.44305883553164...]

    Similarly, if either m or s are given, they must be either a single
    number or have the same number of items as the data:

    >>> skewness(data, m=[2.5, 5.0], s=2)  #doctest: +ELLIPSIS
    [-0.189443057077845..., -2.97696232550900...]

    A negative skewness indicates that the distribution's left-hand tail is
    longer than the tail on the right-hand side, and that the majority of
    the values (including the median) are to the right of the mean. A
    positive skew indicates that the right-hand tail is longer, and that the
    majority of values are to the left of the mean. A zero skew indicates
    that the values are evenly distributed around the mean, often but not
    necessarily implying the distribution is symmetric.

        CAUTION: As a rule of thumb, a non-zero value for skewness
        should only be treated as meaningful if its absolute value is
        larger than approximately twice its standard error. See also
        ``stderrskewness``.

    """
    n, total = stats._std_moment(data, m, s, 3)
    assert n >= 0
    if n < 3:
        raise StatsError('sample skewness requires at least three items')
    skew = v.div(total, n)
    k = math.sqrt(n*(n-1))/(n-2)
    return v.mul(k, skew)