Exemplo n.º 1
0
 def __init__(self, x, y):
     x = narray(x, copy=False, subok=True, dtype=float_).ravel()
     y = narray(y, copy=False, subok=True, dtype=float_).ravel()
     if x.size != y.size:
         msg = "Incompatible size between observations (%s) and response (%s)!" 
         raise ValueError(msg % (x.size, y.size))
     idx = x.argsort()
     self._x = x[idx]
     self._y = y[idx]
Exemplo n.º 2
0
def flowess(x,y,span=0.5,nsteps=2,delta=0):
    """Performs a robust locally weighted regression (lowess).

    Outputs a *3xN* array of fitted values, residuals and fit weights.


:Parameters:
    x : ndarray
        Abscissas of the points on the scatterplot; the values in X must be
        ordered from smallest to largest.
    y : ndarray
        Ordinates of the points on the scatterplot.
    span : Float *[0.5]*
        Fraction of the total number of points used to compute each fitted value.
        As f increases the smoothed values become smoother. Choosing f in the range
        .2 to .8 usually results in a good fit.
    nsteps : Integer *[2]*
        Number of iterations in the robust fit. If nsteps=0, the nonrobust fit
        is returned; setting nsteps=2 should serve most purposes.
    delta : Integer *[0]*
        Nonnegative parameter which may be used to save computations.
        If N (the number of elements in x) is less than 100, set delta=0.0;
        if N is greater than 100 you should find out how delta works by reading
        the additional instructions section.

:Returns:
    A recarray of smoothed values ('smooth'), residuals ('residuals') and local
    robust weights ('weights').


Additional instructions
-----------------------

Fro the original author:

        DELTA can be used to save computations.   Very  roughly  the
        algorithm  is  this:   on the initial fit and on each of the
        NSTEPS iterations locally weighted regression fitted  values
        are computed at points in X which are spaced, roughly, DELTA
        apart; then the fitted values at the  remaining  points  are
        computed  using  linear  interpolation.   The  first locally
        weighted regression (l.w.r.) computation is carried  out  at
        X(1)  and  the  last  is  carried  out at X(N).  Suppose the
        l.w.r. computation is carried out at  X(I).   If  X(I+1)  is
        greater  than  or  equal  to  X(I)+DELTA,  the  next  l.w.r.
        computation is carried out at X(I+1).   If  X(I+1)  is  less
        than X(I)+DELTA, the next l.w.r.  computation is carried out
        at the largest X(J) which is greater than or equal  to  X(I)
        but  is not greater than X(I)+DELTA.  Then the fitted values
        for X(K) between X(I)  and  X(J),  if  there  are  any,  are
        computed  by  linear  interpolation  of the fitted values at
        X(I) and X(J).  If N is less than 100 then DELTA can be  set
        to  0.0  since  the  computation time will not be too great.
        For larger N it is typically not necessary to carry out  the
        l.w.r.  computation for all points, so that much computation
        time can be saved by taking DELTA to be  greater  than  0.0.
        If  DELTA =  Range  (X)/k  then,  if  the  values  in X were
        uniformly  scattered  over  the  range,  the   full   l.w.r.
        computation  would be carried out at approximately k points.
        Taking k to be 50 often works well.

Method
------

        The fitted values are computed by using the nearest neighbor
        routine  and  robust locally weighted regression of degree 1
        with the tricube weight function.  A few additional features
        have  been  added.  Suppose r is FN truncated to an integer.
        Let  h  be  the  distance  to  the  r-th  nearest   neighbor
        from X[i].   All  points within h of X[i] are used.  Thus if
        the r-th nearest neighbor is exactly the  same  distance  as
        other  points,  more  than r points can possibly be used for
        the smooth at  X[i].   There  are  two  cases  where  robust
        locally  weighted regression of degree 0 is actually used at
        X[i].  One case occurs when  h  is  0.0.   The  second  case
        occurs  when  the  weighted  standard error of the X[i] with
        respect to the weights w[j] is  less  than  .001  times  the
        range  of the X[i], where w[j] is the weight assigned to the
        j-th point of X (the tricube  weight  times  the  robustness
        weight)  divided by the sum of all of the weights.  Finally,
        if the w[j] are all zero for the smooth at X[i], the  fitted
        value is taken to be Y[i].

References
----------
    W. S. Cleveland. 1978. Visual and Computational Considerations in
    Smoothing Scatterplots by Locally Weighted Regression. In
    Computer Science and Statistics: Eleventh Annual Symposium on the
    Interface, pages 96-100. Institute of Statistics, North Carolina
    State University, Raleigh, North Carolina, 1978.

    W. S. Cleveland, 1979. Robust Locally Weighted Regression and
    Smoothing Scatterplots. Journal of the American Statistical
    Association, 74:829-836, 1979.

    W. S. Cleveland, 1981. LOWESS: A Program for Smoothing Scatterplots
    by Robust Locally Weighted Regression. The American Statistician,
    35:54.

    """
    x = narray(x, copy=False, subok=True, dtype=float_)
    y = narray(y, copy=False, subok=True, dtype=float_)
    if x.size != y.size:
        raise ValueError("Incompatible size between observations and response!")
    
    
    out_dtype = [('smooth',float_), ('weigths', float_), ('residuals', float_)]
    return numeric.fromiter(zip(*_lowess.lowess(x,y,span,nsteps,delta,)),
                            dtype=out_dtype).view(recarray)