Sequential Neural Posterior Estimation (SNPE) is a powerful approach for conducting inference on complex models and has found applications in various fields, such as neuroscience, cosmology, population genetics, ecology, and biology, where likelihood evaluation is challenging. The sequential update of the proposal distribution allows to refine the prior distribution and improve sample efficiency. Popular SNPE approaches are often limited to either a narrow range of proposal distributions [Pap18F] (SNPE-A) or require importance weighting that can limit performance [Lue17F] (SNPE-B). The authors of the here discussed paper [Gre19A] (SNPE-C) present an approach to sequentially approximate the posterior while allowing flexible proposal distributions and omitting importance weights.
The goal of conditional density estimation in this setting is to select a posterior approximation $q_{\psi}$ from a family of densities, where $\psi$ are distribution parameters. In neural posterior estimation, a neural network $F$ with weights $\phi$ learns to map observations $\mathbf{x}$ onto $\psi$.
$$ q_{\psi}(\theta) = q_{F(\mathbf{x}, \phi)}(\theta) \approx p(\theta \mid \mathbf{x}) $$
To do so, the network is trained by minimizing the negative log-likelihood of training data under the posterior estimate. For sufficiently complex $F$, the mapping from $\mathbf{x}$ to the posterior $p(\theta \mid \mathbf{x})$ is learned as $N \to \infty$ [Pap18F].
$$ \mathcal{L}(\phi) = -\Sigma^N_{j=1} \log q_{F(\mathbf{x}, \phi)}(\theta_j) $$
To improve sample efficiency, a new proposal is obtained by conditioning an initial posterior approximation $q^{(0)}_{\psi}$ on a specific observation $\mathbf{x}_o$. The conditional distribution $q^{(i)}_{\psi}(\theta \mid \mathbf{x} = \mathbf{x}_o)$ is then used as proposal for iteration $i+1$.
However, minimizing the loss on samples drawn from the new proposal $\hat{p}(\theta)$ does not yield the true posterior anymore and requires correction. In the following, $\hat{p}(\mathbf{x})$ denotes the marginal likelihood under the proposal prior, i.e. $\hat{p}(\mathbf{x}) = \int_{\theta} \hat{p}(\theta)p(\mathbf{x} \mid \theta)$.
$$ \hat{p}(\theta \mid \mathbf{x}) = p(\theta \mid \mathbf{x}) \frac{\hat{p}(\theta)p(\mathbf{x})}{p(\theta)\hat{p}(\mathbf{x})} $$
Greenberg et al. [Gre19A] propose a new approach to correct this behavior. They do so by defining an approximation of the proposal posterior $\hat{q}_{(\mathbf{x}, \phi)}$ w.r.t. the approximation of the true posterior $q_{F(\mathbf{x}, \phi)}$. By observing that the proposal posterior is proportional to the true posterior and the ratio between the priors, i.e. $\hat{p}(\theta \mid \mathbf{x}) \approx p(\theta \mid \mathbf{x}) \frac{\hat{p}(\theta)}{p(\theta)}$, they define the approximation as follows:
$$ \hat{q}_{(\mathbf{x}, \phi)}(\theta) = q_{F(\mathbf{x}, \phi)}(\theta) \frac{\hat{p}(\theta)}{p(\theta)}\frac{1}{Z(\mathbf{x}, \phi)} $$
Where $Z(\mathbf{x}, \phi)$ is a normalization constant.
The authors propose to minimize $\hat{\mathcal{L}}(\phi)=-\sum^N_{j=1} \log \hat{q}_{(\mathbf{x}, \phi)}(\theta)$. By Proposition 1 of [Pap18F], minimizing the loss yields $\hat{q}_{(\mathbf{x}, \phi)}(\theta) = \hat{p}(\theta \mid \mathbf{x})$ and $q_{F(\mathbf{x}, \phi)}(\theta) = p(\theta \mid \mathbf{x})$ for $N \to \infty$, under the assumption that the family of densities is sufficiently expressive and an optimal $\phi^{\ast}$ exists.
The authors continue to present an adaptation that admits arbitrary choices of priors, proposals and density estimators. Also, they showcase their approach on several common benchmarking problems in SBI literature, such as Lotka-Volterra, Two-Moons, and SLCP (Simple Likelihood and Complex Posterior).
Finally, the approach presented by Greenberg et al. [Gre19A] avoids numerical challenges and limitations of previous SNPE techniques, allows to re-use data over several rounds, in contrast to [Pap18F], and does not have to apply importance weighting like [Lue17F].