Пример #1
0
def midrange(data):
    """Returns the midrange of a sequence of numbers.

    >>> midrange([2.0, 3.0, 3.5, 4.5, 7.5])
    4.75

    The midrange is halfway between the smallest and largest element. It is
    a weak measure of central tendency.
    """
    try:
        L, H = minmax(data)
    except ValueError as e:
        e.args = ('no midrange defined for empty iterables',)
        raise
    return (L + H)/2
Пример #2
0
def range(data, interval=0):
    """range(iterable [, interval=0]) -> sample range R of data

    The range R is the difference between the smallest and largest element
    in the given sample. It is an unbiased but weak measure of variability,
    and is frequently used in process control applications.

    >>> range([1.0, 3.5, 7.5, 2.0, 0.25])
    7.25

    For N > 15, the sampling distribution of R becomes unstable and it is
    wise to treat the sample range with caution.

    An even better measure of variability is R/d2, where d2 is a value
    that depends only on N. For samples taken from a normally-distributed
    population, the d2 values are available by looking up N in the dict
    ``range.d2``. For small N (say, up to about 10) R/d2 makes a good
    estimator of the population standard deviation.


    Correction for binned or rounded data
    -------------------------------------

    If the data points have been uniformly rounded (perhaps by binning, or
    by rounding to a fixed number of decimal places, or simply due to
    measurement error), the samples represent intervals rather than exact
    values. E.g. if x=1.2 is given to one decimal place, x could actually
    be any number between 1.15 and 1.25. In this case, it is appropriate to
    make an adjustment to the sample range by taking into account the width
    of the data interval:

    >>> range([1.2, 3.0, 1.5, 2.4, 0.2], 0.1)
    2.9

    The ``interval`` argument is optional, with default value of 0. If
    given, it must be a non-negative number.

    No attempt is made to check that the data points actually are consistent
    with the given interval.
    """
    if interval < 0:
        raise ValueError('interval must be non-negative')
    try:
        a, b = minmax(data)
    except ValueError as e:
        e.args = ('no range defined for empty iterables',)
        raise
    return b - a + interval
Пример #3
0
def fivenum(data):
    """Return Tukey's five number summary from data.

    The five summary numbers are:

        minimum, lower-hinge, median, upper-hinge, maximum


    >>> tuple(fivenum([2, 4, 6, 8, 10, 12, 14, 16, 18]))
    (2, 6, 10, 14, 18)

    The summary is a namedtuple with the following fields:

        minimum
        lower_hinge
        median
        upper_hinge
        maximum

    If the data has length N of the form ``4n+5`` (e.g. 5, 9, 13, 17...)
    then the hinges can be visualised by writing out the sorted data in the
    shape of a W, where each limb of the W is equal is length. For example,
    the data (A,B,C,...,M) has N=13 and would be written out like this:

        A           G           M
          B       F   H       L
            C   E       I   K
              D           J

    The hinges are D, G and J and the fivenum summary is (A, D, G, J, M).

    For data with length that doesn't match ``4n+5``, the three hinges are
    interpolated. They are equivalent to ``quartiles`` called with scheme=1.
    """
    if isinstance(data, str):
        raise TypeError('data argument cannot be a string')
    data = sorted(data)
    a, b = minmax(data)
    h1, m, h2 = quartiles(data, scheme=1)
    summary = collections.namedtuple('fivenum',
                'minimum lower_hinge median upper_hinge maximum')
    return summary(a, h1, m, h2, b)