GitHub - sjnarmstrong/basic-regression-methods: A few basig regression methods. Including linear and baysean regression

bibliography

reporttemplate.bib

We have seen how Bayesian methods can be useful in determining the probability of certain events occurring. We now turn to the problem of linear regression. This involves determining the process which is used to generate a set of target values given a set of input variables. Formally, given a set of input variables and a set of target variables , we seek to find the function , which was used to generate the target variables from the given input. This is a complex task as there is often an element of noise added to the function before the target variable is produced. To keep things simple, we will focus on data containing a 1D input variable and a 1D output target . furthermore, we turn to a polynomial function, in order to approximate our underling function . This is due to the fact that most functions can be accurately approximated by a few terms of their Taylor expansion. The parametric function used can therefore be written as follows:

Where

is the order of the polynomial function and plays a role in the maximum complexity of the function. We have also defined

as

. Further, for convenience when handling multiple data-points, we define

as

. Here

denotes the number of data-points.
We can now see that in order to approximate the given function

, we must determine suitable values for the parameters

. In this exercise we explore the use of Bayesian and classical methods to achieve this.

Least Squares Approach

The least squares approach tries to minimise the squared error between the target variables and the parametrised function. This error function is defined as follows:

This is then minimised in close form by taking the derivative with respect to W and setting this to zero.

The result is given by [@christopher2016pattern] in equation 3.15 as:

Maximum Likelihood Approach

We now consider the likelihood of obtaining the target data from the parametric function. For this we assume that the data has a Gaussian distribution around the given function at any given input . This is therefore written as follows:

Where

is the precision of the Gaussian distribution. Taking the natural logarithm of this function we get:

Minimising the log likelihood is identical to minimising the sum of squares error. This can be seen in equation \[eqn:E3:MLErr\], as the only term dependant on

is a scalar multiple of the least squares error function. Due to this,

can be determined with the use of equation \[eqn:E3:Wml\].

Bayesian Approach

The Bayesian approach attempts to determine the probability of the parameters given the target variables . Assuming this takes a Gaussian form, we can model this probability as follows:

Where

represents the mean of the weights and

represents the variance. These can be determined in a Bayesian approach by assuming an initial

and

. Equations 3.50 and 3.51 from [@christopher2016pattern] can then be used to update these parameters. This update step is given as follows:

It is common practice to assume a zero mean for and a large variation for corresponding to . Here is known as the identity matrix.

Results and Discussion of above methods {#sec:E3:ResDesc}

We now run the above algorithms on a dataset containing 10 points with corresponding and values. We will assume that the data is generated in such a manner that and . Furthermore, we will assume a zero prior mean on the weights for the Bayesian linear regression. We start by considering an order 4 polynomial function. The results are given in figure [fig:E3:Or4:LSQ].\