GSoC #8: A Light at the End of the Tunnel
Published:
With the final product due about three weeks from now, I have been busy tidying things up so we have an end-to-end solution ready, with a few self-contained issues/TODOs remaining, which can be wrapped up into seperate issues quite readily. As of yesterday, I’ve gotten INLA working end-to-end with pmx.fit
, as well as having added a few test cases and some guardrails (namely that the latent must be a Gaussian in precision form).
I also found out that we can force pm.sample
to fall back on Metropolis instead of NUTS, which is amenable to minimize
in its current state, but even this gradient-free algorithm still yields very slow results. Running profiling revealed that minimize
is taking up 98% of the runtime per call to logp()
and was on the order of \(10^{-2}\) seconds. However, switching the optimisation algorithm to L-BFGS-B reduced the runtime by an order of magnitude down to \(10^{-3}\) seconds per call (~85% of the runtime). This significantly sped up Metropolis sampling to only taking ~1 min instead of ~10 minutes, but this is still significantly slower than directly sampling the joint prior, which only takes a few seconds under these settings. It may be that INLA scales better to larger problems however.
Regarding to outstanding issues, we have:
TensorFromScalar
needs to have a No-Op implemented to handle NUTS sampling withminimize
. This should just be a one-liner and is already waiting in a seperate PR.- As mentioned last week, NUTS doesn’t work with
minimize
. This is a PyTensor-facing issue which Jesse and Ricardo are working on. - We need a way to obtain the posteriors for the latents, \(p(x \mid y)\). I suspect that this will either be related to
sample_posterior_predictive
orunmarginalize
, and will probably be the longest issue in this list for me. - We need to have a Jupyter notebook demonstrating INLA’s useage. This shouldn’t be too much of a hassle as I have already been doing most of my work through notebooks, and will likely just be a matter demonstrating it on a more complex model, such as ICAR.
- As I note at the end of GSoC #5, \(p(y \mid \theta)\) is evaluated at \(x_0\), which I believed meant that the quadratic form in the Laplace approximation was zero, but a closer look at GMRFS suggests that the mean of the Laplace Gaussian is in fact \(u^{*} = T^{-1} (T \mu + f'(x_0) - x_0^T f^"(x_0))\), (though this may itself be equal to \(x_0\)) i.e. it would instead be \((x_0 - u^*)^T T (x_0 - u^*)\). I will follow up with Theo and the literature on this. Either way, this should just be a one-liner.
- As discussed above,
minimize
currently seems to be far too slow. However, that is an issue for Jesse rather than myself at this stage. If it turns out we can’t do anything about this, this has the capacity to become quite a time-consuming issue however, as we will most likely need to pick up some savings by making a custom sampler to alternate between minimisation and sampling. - It may be possible to speed up computing the Hessian \(f^"(x_0)\) for the Laplace approximation with the adjoint method or similar. Something to discuss with my mentors.