Skip to content

snphbaum/scikit-gpuppy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scikit-GPUPPY: Gaussian Process Uncertainty Propagation with PYthon

Copyright (C) 2015 Philipp Baumgaertel
All rights reserved.

This software may be modified and distributed under the terms of the BSD license. See the LICENSE.txt file for details.

https://github.com/snphbaum/scikit-gpuppy

Overview

This package provides means for modeling functions and simulations using Gaussian processes (aka Kriging, Gaussian random fields, Gaussian random functions). Additionally, uncertainty can be propagated through the Gaussian processes.

Note

The Gaussian process regression and uncertainty propagation are based on Girard's thesis [1].

An extension to speed up GP regression is based on Snelson's thesis [2].

Warning

The extension based on Snelson's work is already usable but not as fast as it should be. Additionally, the uncertainty propagation does not yet work with this extension.

An additional extension for Inverse Uncertainty Propagation is based on my paper (and upcoming PhD thesis) [3].

A simulation is seen as a function f(x)+\epsilon (x \in \mathbb{R}^n) with additional random error \epsilon \sim \mathcal{N}(0,v). This optional error is due to the stochastic nature of most simulations.

The GaussianProcess module uses regression to model the simulation as a Gaussian process.

The UncertaintyPropagation module allows for propagating uncertainty x \sim \mathcal{N}(\mu,\Sigma) through the Gaussian process to estimate the output uncertainty of the simulation.

The FFNI and TaylorPropagation modules provide classes for propagating uncertainty through deterministic functions.

The InverseUncertaintyPropagation module allows for propagating the desired output uncertainty of the simulation backwards through the Gaussian Process. This assumes that the components of the input x are estimated from samples using maximum likelihood estimators. Then, the inverse uncertainty propagation calculates the optimal sample sizes for estimating x that lead to the desired output uncertainty of the simulation.

[1]Girard, A. Approximate Methods for Propagation of Uncertainty with Gaussian Process Models, University of Glasgow, 2004
[2]Snelson, E. L. Flexible and efficient Gaussian process models for machine learning, Gatsby Computational Neuroscience Unit, University College London, 2007
[3]Baumgaertel, P.; Endler, G.; Wahl, A. M. & Lenz, R. Inverse Uncertainty Propagation for Demand Driven Data Acquisition, Proceedings of the 2014 Winter Simulation Conference, IEEE Press, 2014, 710-721

Requirements

  • scipy
  • numpy
  • statsmodels
  • nose

Optional:

  • weave or Cython for performance

Documentation

https://pythonhosted.org/scikit-gpuppy/

or

python setup.py build_sphinx

Installation

pip install scikit-gpuppy

or

python setup.py install

Testing

After Installation:

nosetests skgpuppy

Usage

Regression

import numpy as np
from skgpuppy.GaussianProcess import GaussianProcess
from skgpuppy.Covariance import GaussianCovariance

# Preparing some parameters (just to create the example data)
x = np.array([[x1,x2] for x1 in xrange(10) for x2 in xrange(10)]) # 2d sim input (no need to be a neat grid in practice)
w = np.array([0.04,0.04])   # GP bandwidth parameter
v = 2                       # GP variance parameter
vt = 0.01                   # GP variance of the error epsilon

# Preparing the parameter vector
theta = np.zeros(2+len(w))
theta[0] = np.log(v)  # We actually use the log of the parameters as it is easier to optimize (no > 0 constraint etc.)
theta[1] = np.log(vt)
theta[2:2+len(w)] = np.log(w)

# Simulating simulation data by drawing data from a random Gaussian process
t = GaussianProcess.get_realisation(x, GaussianCovariance(),theta)

# The regression step is pretty easy:
# Input data x (list of input vectors)
# Corresponding simulation output t (just a list of floats of the same length as x)
# Covariance function of your choice (only GaussianCovariance can be used for uncertainty propagation at the moment)
gp_est = GaussianProcess(x, t,GaussianCovariance())

# Getting some values from the regression GP for plotting
x_new = np.array([[x1/2.0,x2/2.0] for x1 in xrange(20) for x2 in xrange(20)])
means, variances = gp_est.estimate_many(x_new)

# Plotting the output
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_trisurf(x.T[0],x.T[1],t, cmap=cm.autumn, linewidth=0.2)
ax.plot_trisurf(x_new.T[0],x_new.T[1],means, cmap=cm.winter, linewidth=0.2)
plt.show()

Uncertainty Propagation

# Continuing the regression example

from skgpuppy.UncertaintyPropagation import UncertaintyPropagationApprox

# The uncertainty to be propagated
mean = np.array([5.0,5.0]) # The mean of a normal distribution
Sigma = np.diag([0.01,0.01]) # The covariance matrix (must be diagonal because of lazy programming)

# Using the gp_est from the regression example
up = UncertaintyPropagationApprox(gp_est)

# The propagation step
out_mean, out_variance = up.propagate_GA(mean,Sigma)

print out_mean, out_variance

Inverse Uncertainty Propagation

# Continuing the propagation example

from skgpuppy.InverseUncertaintyPropagation import InverseUncertaintyPropagationApprox

# The fisher information matrix for the maximum likelihood estimation of x
# This assumes both components of x to be rate parameters of exponential distributions
I = np.array([1/mean[0]**2,1/mean[1]**2])

# cost vector: the cost for collecting one sample for the estimation of the components of x
c = np.ones(2) # Collecting one sample for each component of x costs 1

# The cost for collecting enough samples to approximately get the Sigma from above (Cramer-Rao-Bound)
print (c/I/np.diag(Sigma)).sum()

# The desired output variance (in this example) is out_variance
# Getting the Sigma that leads to the minimal data collection costs while still yielding out_variance
# If multiple parameters from the same distribution (and therefore the same sample) have to be estimated, we could use the optional parameter "coestimated"
iup = InverseUncertaintyPropagationApprox(out_variance,gp_est,mean,c,I)
Sigma_opt = np.diag(iup.get_best_solution())

# The optimal data collection cost to get the output variance out_variance
print (c/I/np.diag(Sigma_opt)).sum()

# Proof that we actually do get close to out_variance using Sigma_opt
out_mean, out_variance2 = up.propagate_GA(mean,Sigma_opt)
print out_mean, out_variance2

About

Gaussian Process Uncertainty Propagation with PYthon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published