Skip to content

DiogoRibeiro7/knn-variance

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The k-NN uncertainty measure

Model independent heuristic estimation of prediction errors

License

Overview

For most regression models, their overall accuracy can be estimated with help of various error measures. However, in some applications it is important to provide not only point predictions, but also to estimate the "uncertainty" of the prediction, e.g., in terms of confidence intervals, variances, or interquartile ranges. There are very few statistical modeling techniques able to achieve this. For instance, the Kriging/Gaussian Process method is equipped with a theoretical mean squared error. In this paper we address this problem by introducing a heuristic method to estimate the uncertainty of the prediction, based on the error information from the k-nearest neighbours. This heuristic, called the k-NN uncertainty measure, is computationally much cheaper than other approaches (e.g. bootstrapping) and can be applied regardless of the underlying regression model. To validate and demonstrate the usefulness of the proposed heuristic, it is combined with various models and plugged into the well-known Efficient Global Optimization algorithm (EGO). Results demonstrate that using different models with the proposed heuristic can improve the convergence of EGO significantly.

See our paper for additional details: https://link.springer.com/chapter/10.1007/978-3-319-91479-4_40

Equation

The equation of the k-NN uncertainty measure is given in latex formula below:

Where and

Python examples and function

In the directory examples you can find illustrations and example code to view the effect of the uncertainty measure.

The uncertainty measure is also given as a python function below:

def knnUncertainty(k,pred,x,y):
    #The measure of how certain a give prediction is given its k neighbours
    #k is the number of neighbours taken into account
    #pred is the predicted point
    #x is the set of known points (input)
    #y is the set of known points (output)
    no = MinMaxScaler(copy=True)
    nbrs = NearestNeighbors(n_neighbors=k, algorithm='ball_tree').fit(no.fit_transform(X))
    normx = no.transform(x)
    sigma = []
    distances, indices = nbrs.kneighbors(normx,k)
    
    for i in range(len(x)):
        dist = distances[i]
        ind = indices[i]
        #calculate the neirest point error
        pred_neir = pred[i]

        abs_err = np.abs(pred_neir - y[ind])
        weights = 1 - (dist / dist.sum())
        weighted_err = np.average(abs_err, weights=weights**NN) 

        nbrs_y = list(y[ind])
        nbrs_y.append(pred[i])
        nbrs_var = np.std(nbrs_y)

        min_dist = np.min(dist)
        pred_var = weighted_err + min_dist * nbrs_var
        sigma.append(pred_var)
    sigma = np.array(sigma)
    return sigma

Cite our paper

If you use this uncertainty measure please cite our scientific paper:

@inproceedings{van2018novel,
  title={A Novel Uncertainty Quantification Method for Efficient Global Optimization},
  author={van Stein, Bas and Wang, Hao and Kowalczyk, Wojtek and B{\"a}ck, Thomas},
  booktitle={International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems},
  pages={480--491},
  year={2018},
  organization={Springer}
}

About

Model independent heuristic estimation of prediction errors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%