Spectral likelihood expansions for Bayesian inference | TransferLab

Bayesian inference provides a convenient framework to incorporate both prior knowledge and evidence obtained through observations into the analysis of some parameters $\theta$. Computing the posterior $p(\theta \mid \mathbf{x})$ is the primary goal in Bayesian inference. However, only a limited number of instances allow for closed-form expression of the posterior density. Access to the posterior is usually obtained using stochastic methods such as Variational Inference and Markov Chain Monte Carlo.

The latter is, despite being popular, computationally demanding as it requires many evaluations of the underlying computational model. In engineering disciplines, a single evaluation of the forward model can be expensive. Requiring several hundred of evaluations can even be prohibitive to the approach.

To overcome this problem more efficient sampling algorithms, such as the Hamiltonian Monte Carlo, can be utilized. These algorithms, however, do not solve the problem completely, especially in high dimensions. Alternatively, the original model can be replaced by a surrogate model. While those changes reduce the computational complexity of the approach, they still inherit the shortcomings of sampling based approaches.

[Nag16S] propose to decompose the likelihood function into a series of polynomials, being orthogonal w.r.t. the prior distribution. They obtain such a decomposition by representing the likelihood in terms of a Polynomial Chaos Expansion (PCE) [Wie38H, Xiu02W]. They further show that the quantities of interest can be extracted from the resulting spectral likelihood expansion.

The authors restrict their discussion on the Hilbert space of square integrable functions w.r.t. the prior density $\pi(\theta)$, i.e.

$$ L_{\pi}^2(\mathcal{D}_{\theta}) = \left\{ u : \mathcal{D}_{\theta} \to \mathbb{R} \mid \int_{\mathcal{D}_{\theta}} u^2 (\theta)\pi(\theta)d\theta < \infty \right\}. $$

Orthogonality of the functions is defined by the inner product

$$ \langle u,v \rangle_{L_{\pi}^2} = \int_{\mathcal{D}_{\theta}} u(\theta)v(\theta)\pi(\theta) d\theta = \mathbb{E}_{\pi}\left[ u(\theta)v(\theta) \right]. $$

For uniform and normal priors, the families of Legendre and Hermite polynomials provide the orthonormal basis, respectively. In case the chosen prior differs, the orthonormal polynomials have to be computed or the variables have to be transformed accordingly [Xiu02W].

If the likelihood $\mathcal{L} \in L_{\pi}^2(\mathcal{D}_{\theta})$, it can be represented w.r.t. to an orthonormal basis ${ \Psi_{\alpha}} _{\alpha \in \mathbb{N}^n}$. Assuming that the likelihood is square integrable is valid by the fact that, in the setting of maximum likelihood estimation, it is assumed that the likelihood is bounded from above. The spectral likelihood expansion then reads

\begin{align} \mathcal{L} &= \Sigma_{\alpha \in \mathbb{N}^n} c_{\alpha}\Psi_{\alpha} \\ c_{\alpha} &= \langle \mathcal{L}, \Psi_{\alpha} \rangle_{L_{\pi}^2} = \int_{\mathcal{D}_{\theta}} \mathcal{L}(\theta)\Psi_{\alpha}(\theta)\pi(\theta) d\theta \end{align}

The expansion can be truncated to a finite number of terms $\alpha \in \mathcal{A}_p$, or even made sparse using different approaches presented for PCEs, yielding the form

$$ \hat{\mathcal{L}} = \Sigma_{\alpha \in \mathcal{A}_p} c_{\alpha}\Psi_{\alpha}. $$

After the computation of the orthogonal basis, the coefficients $c_{\alpha}$ can be computed solving a least squares problem.

The authors provide the identities of the posterior and the evidence term w.r.t. to the expansion, based on the orthogonal decomposition of the likelihood.

\begin{align} p(\theta \mid \mathbf{x}) &= \frac{1}{Z} \left( \Sigma_{\alpha \in \mathbb{N}^n} c_{\alpha}\Psi_{\alpha} \right) \pi(\alpha) \\ Z &= \langle 1, \mathcal{L} \rangle_{L_{\pi}^2} = \langle \Psi_0, \Sigma_{\alpha \in \mathbb{N}^n} c_{\alpha}\Psi_{\alpha} \rangle_{L_{\pi}^2} = c_0 \end{align}

The authors of [Nag16S] go on and show how posterior marginals, first posterior moments and other quantities of interest can be efficiently computed using the expansion.

Figure 1: [Nag16S] demonstrate the quality of spectral likelihood expansions on various examples. Here, the goal is to estimate the unknown parameter $\mu$ of a 1D Gaussian. The prior and likelihood are Gaussian as well. For a polynomial degree up to $p=20$, the expansion approximates the ground truth well but drifts off at the tails. On the right, the reconstructed posterior is shown. The good posterior for different degrees $p < 20$ is attributed to the weight of the prior for larger $\mu$.

The presented spectral likelihood expansion provides efficient access to the posterior and further quantities arising from it. This is a clear advantage over the computationally heavy MCMC approaches. However, due to the polynomials in the expansion and the truncation to a finite number of terms, the likelihood and all dependent terms can take on negative values. The pathology is shown in Figure 1 This is clearly irritating, even though this only happens in the tails of the distributions.

References

In this series →