GSoC #5: Putting it all together

3 minute read

Published:

This week I got started on my next PR to implement “regular” (i.e. non-Laplace) marginalisation on the hyperparameters, and also develop a skeleton for a user-facing fit_INLA method which would essentially be the final product.

As I currently see it, I envision the high-level functionality is as follows:

  1. Obtain the posterior \(p(\theta \mid y) = \frac{p(y \mid x, \theta)p(x \mid \theta)p(\theta)}{p_G(x \mid y, \theta)}\) in log form up to a constant \(-\log(p(y))\).
    1. \(p(y \mid x, \theta)\) is the likelihood of the observations, that is model.logp()
    2. \(p(x \mid \theta) = -\frac{1}{2} (x - \mu)^T Q (x - \mu) + \frac{1}{2} \text{logdet}(Q) - \frac{d}{2}\log(2\pi)\) is specified by the fact that \(x\) is a latent Gaussian.
    3. \(p(\theta)\) is specified by the user.
    4. \(p_G(x \mid y, \theta)\) is the Laplace approximation mentioned in previous posts.
  2. Additionally obtain the joint posterior for the latent field \(p(x_j \mid \theta, y) = \frac{p(y \mid x, \theta)p(x \mid \theta)p(\theta)}{p(x_{-j} \mid x_j, \theta, y)}\)
    • Where \(x_{-j}\) follows the convention in the literature to denote “every component except \(x_j\).
    • It is currently unclear to me how to obtain \(p(x_{-j} \mid x_j, \theta, y)\), however it is my understanding that it is typically obtained using a similar Laplace approximation, or by “conventional” marginalisation (sampling/integration).
  3. Marginalize out the hyperparameters:
    1. \[p(x_j \mid y) = \int p(x_j \mid \theta, y)p(\theta \mid y) d\theta\]
    2. \[p(\theta_k \mid y) = \int p(\theta \mid y) d\theta_{-k}\]

The final marginalisation step is where the real challenge is. We can choose to compute these integrals either explicitely (e.g. using some quadrature method like R-INLA) or by sampling over the hyperparameters (as is the case in Stan). I’ll most likely go with sampling. Then under the umbrella of sampling, there’s an entire family of methods to pick from: Hamiltonian methods (NUTS), MCMC methods, Quasi-Monte Carlo, etc. Furthermore, some samplers only work with continuous variables while others work for discrete ones. I suspect that this will require a significant amount of work throughout the GSoC project, but for now, I’ll just go with a continuous sampler.

I also decided to go back and refactor the Laplace approximation to take the following form:

\begin{equation} p_G(x) = N(x_0, T) \end{equation}

Where \(x_0\) is the mode and \(T = Q - f^"(x_0)\) is the precision matrix. I did so as I felt that returning it as a normal distribution would be more elegant, readable and “complete”, rather than as a hardcoded log probability. However, after discussing with Theo, it may be better to explicitely return the log probability after all, given that I do end up simply getting the log probability of the normal straight after obtaining it. However, instead of writing the full Laplace log probability, we can instead write it as the log probability of the above normal:

\begin{equation} \log(p_G(x)) = -\frac{1}{2} (x - x_0)^T T (x - x_0) + \frac{1}{2} \text{logdet}(T) - \frac{d}{2}\log(2\pi) \end{equation}

Which is much more elegant and compact than the form presented in #3. As an additional aside, the literature seems to suggest that we evaulate the above at \(x_0\), but that would suggest to me that all we’re after is simply \(\frac{1}{2} \text{logdet}(T) - \frac{d}{2}\log(2\pi)\) given that the quadratic form would always be zero at \(x = x_0\). I’ll find out sooner or later, but for now I’m assuming \(x\) can be any number.