def __init__(self, x, y): x = narray(x, copy=False, subok=True, dtype=float_).ravel() y = narray(y, copy=False, subok=True, dtype=float_).ravel() if x.size != y.size: msg = "Incompatible size between observations (%s) and response (%s)!" raise ValueError(msg % (x.size, y.size)) idx = x.argsort() self._x = x[idx] self._y = y[idx]
def flowess(x,y,span=0.5,nsteps=2,delta=0): """Performs a robust locally weighted regression (lowess). Outputs a *3xN* array of fitted values, residuals and fit weights. :Parameters: x : ndarray Abscissas of the points on the scatterplot; the values in X must be ordered from smallest to largest. y : ndarray Ordinates of the points on the scatterplot. span : Float *[0.5]* Fraction of the total number of points used to compute each fitted value. As f increases the smoothed values become smoother. Choosing f in the range .2 to .8 usually results in a good fit. nsteps : Integer *[2]* Number of iterations in the robust fit. If nsteps=0, the nonrobust fit is returned; setting nsteps=2 should serve most purposes. delta : Integer *[0]* Nonnegative parameter which may be used to save computations. If N (the number of elements in x) is less than 100, set delta=0.0; if N is greater than 100 you should find out how delta works by reading the additional instructions section. :Returns: A recarray of smoothed values ('smooth'), residuals ('residuals') and local robust weights ('weights'). Additional instructions ----------------------- Fro the original author: DELTA can be used to save computations. Very roughly the algorithm is this: on the initial fit and on each of the NSTEPS iterations locally weighted regression fitted values are computed at points in X which are spaced, roughly, DELTA apart; then the fitted values at the remaining points are computed using linear interpolation. The first locally weighted regression (l.w.r.) computation is carried out at X(1) and the last is carried out at X(N). Suppose the l.w.r. computation is carried out at X(I). If X(I+1) is greater than or equal to X(I)+DELTA, the next l.w.r. computation is carried out at X(I+1). If X(I+1) is less than X(I)+DELTA, the next l.w.r. computation is carried out at the largest X(J) which is greater than or equal to X(I) but is not greater than X(I)+DELTA. Then the fitted values for X(K) between X(I) and X(J), if there are any, are computed by linear interpolation of the fitted values at X(I) and X(J). If N is less than 100 then DELTA can be set to 0.0 since the computation time will not be too great. For larger N it is typically not necessary to carry out the l.w.r. computation for all points, so that much computation time can be saved by taking DELTA to be greater than 0.0. If DELTA = Range (X)/k then, if the values in X were uniformly scattered over the range, the full l.w.r. computation would be carried out at approximately k points. Taking k to be 50 often works well. Method ------ The fitted values are computed by using the nearest neighbor routine and robust locally weighted regression of degree 1 with the tricube weight function. A few additional features have been added. Suppose r is FN truncated to an integer. Let h be the distance to the r-th nearest neighbor from X[i]. All points within h of X[i] are used. Thus if the r-th nearest neighbor is exactly the same distance as other points, more than r points can possibly be used for the smooth at X[i]. There are two cases where robust locally weighted regression of degree 0 is actually used at X[i]. One case occurs when h is 0.0. The second case occurs when the weighted standard error of the X[i] with respect to the weights w[j] is less than .001 times the range of the X[i], where w[j] is the weight assigned to the j-th point of X (the tricube weight times the robustness weight) divided by the sum of all of the weights. Finally, if the w[j] are all zero for the smooth at X[i], the fitted value is taken to be Y[i]. References ---------- W. S. Cleveland. 1978. Visual and Computational Considerations in Smoothing Scatterplots by Locally Weighted Regression. In Computer Science and Statistics: Eleventh Annual Symposium on the Interface, pages 96-100. Institute of Statistics, North Carolina State University, Raleigh, North Carolina, 1978. W. S. Cleveland, 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74:829-836, 1979. W. S. Cleveland, 1981. LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression. The American Statistician, 35:54. """ x = narray(x, copy=False, subok=True, dtype=float_) y = narray(y, copy=False, subok=True, dtype=float_) if x.size != y.size: raise ValueError("Incompatible size between observations and response!") out_dtype = [('smooth',float_), ('weigths', float_), ('residuals', float_)] return numeric.fromiter(zip(*_lowess.lowess(x,y,span,nsteps,delta,)), dtype=out_dtype).view(recarray)