Machine learning has a huge impact in fields where theory-based models have failed to perform in a satisfactory manner. Examples of these fields are image processing, speech recognition, and machine translation. However, machine learning can also play an important role in improving methods in fields where physical theories have dominated. An important difference with aforementioned fields is that in physics dominated domains, the majority of the problem can be modeled using physical laws. Directly applying machine learning to map the input of the problem to some output of interest usually does not work that well. However, finding the step where the largest assumption is made and applying machine learning to this step can significantly improve the results of these methods.
One way to implement machine learning to physical models is the paradigm of field inversion and machine learning [1]. An important advantage of this paradigm is that enables incorporating prior knowledge into the model, and also allows the modeler to extract modeling knowledge from the results. The paradigm can be summarized as follows:
- Define some corrective term in the base model
- Extract the optimal corrective function from high fidelity data
- Train a machine learning model to estimate the corrective function, given a set of features
One way of defining the optimization step is in maximizing the probability of the corrective term given the data, i.e. finding the maximum a posteriori (MAP) solution. Assuming that the prior and the discrepancy between the model output and the high-fidelity data are normally distributed gives us the following posterior.
As we want to maximize the posterior, our optimization objective is to minimize
If we want to use gradient-based optimization methods, we need some way to find the gradient of the objective function with respect to the corrective term. For small scale problems, it is easy to find these gradient using a finite difference approximation. However, in applications where the modeling problem is discretized into a large number of cells (e.g. in computational fluid dynamics), this approach is computationally unfeasible.
We can rewrite the gradient of the objective function using the chain rule.
The explicit derivatives are easy to obtain: they can be derived directly from our definition of the objective function. However, the sensitivity of our variables cannot be obtained straightforwardly. Also, we have a set of governing equations which we can rewrite as
As we don't want the validity of our governing equations to change if we change the corrective term, we can write
Again, the explicit derivatives follow straightforwardly from the discretization of the governing equations. Introducing some new set of variables
We will call this new set of variables the adjoint variables. Using the constraint that the indicated term should be zero, they can be determined by solving a system of linear equations.
We now have an expression for the gradients which we can easily evaluate, given that we do one extra system solve. Note that our gradient calculation is now practically independent of the number of points in our simulation.
To illustrate the paradigm, [1] uses the following scalar ordinary differential equation
where can be a function of z, T is our primal variable, and
where h = 0.5. Let's say we want to model this process using
and want to enhance this model using a spatially varying corrective term,
The convenience of illustrating the paradigm using a simple model problem like this is that we can derive the true form of the corrective term.
The problem can be discretized using finite volume discretization with homogeneous boundary conditions. Using a central difference scheme for the second order derivative and rewriting the equation for the temperature in cell i gives
Similarly, the base model and the augmented model can be solved as
and
These equations are then solved iteratively until convergence, using under-relaxation to stabilize the iterations, i.e.
where trades off stability (for low alpha) and convergence (for high alpha). The iterations are stopped once the L2-norm between two consecutive solutions drops below a specified criterion.
The partial derivatives necessary for setting up the adjoint equation require taking two scalar-by-vector and two vector-by-vector derivatives of the objective function and the governing equation, respectively. This can be done conveniently using Einstein summation convention.
Making use of the fact that the prior and observational covariance matrices are symmetric, and using
where is the Kronecker delta, we can easily derive
In [1], Gaussian processes are used for the machine learning phase. Some preliminary investigations with random forests and neural networks show similar or slightly improved results. An important requirement for the machine learning phase is its capability of taking into account the variance information coming from the field inversion phase. As a next step I will look into using TensorFlow Probability [2] for implementing a Bayesian neural network to take into account the posterior variance of the field inversion phase.
[1] Parish, E. J., & Duraisamy, K. (2016). A paradigm for data-driven predictive modeling using field inversion and machine learning. Journal of Computational Physics, 305, 758-774.