Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

Identifying the source distribution behind observed data is an ill-posed problem. Sourcerer [Vet24S] introduces a novel approach based on maximum entropy to preserve the maximum level of uncertainty in the source distribution, while yielding a unique solution.

Mathematical and computational simulators are instrumental in modeling complex scientific and industrial problems. A prevalent difficulty across domains is identifying parameters that lead to certain experimental observations.

We will consider a stochastic simulator $f(\theta)=x,$ capable of producing observations $x$ from a set of parameters $\theta.$ This simulator enables the generation of samples from the likelihood $p(x \mid \theta),$ which is typically either intractable or unavailable.

Evaluating this simulator on various parameter configurations yields a dataset $\mathcal{D} = \{x_1, \dots, x_n\}$ and a corresponding empirical distribution $p(x_o).$ With Sourcerer, [Vet24S] introduces a strategy to determine a distribution $q(\theta)$ that, when processed through the simulator, results in the empirical push-forward distribution $q^{\#}(x)$ given by:

$$q^{\#}(x) = \int_{\Theta} p(x \mid \theta)q(\theta)d\theta$$

A well-known tactic for estimating the source distribution is empirical Bayes, which refines the parameters $\phi$ of the prior by maximizing the marginal likelihood:

$$p(\mathcal{D}) = \prod_i \int p(x_i \mid \theta)q_{\phi}(\theta) d\theta$$

However, this approach is inadequate for simulators where the likelihood is unavailable or where the problem of parameter estimation is ill-posed.

Maximum Entropy Source Distribution Estimation

The authors choose between competing source distributions taking the one that maximizes entropy. Intuitively this it the one that embodies the greatest level of ignorance: The entropy $H(p)$ of a distribution $p$ is defined as $H(p) = -\int p(\theta)\ \log p(\theta)\ d\theta.$ To identify the maximum entropy source distribution, [Vet24S] propose to maximize $H(q),$ subject to the constraint that the push-forward distribution $\int p(x\mid \theta) q(\theta) d\theta$ equals the empirical distribution $p(x_o).$

As shown by the authors, optimizing the entropy of $q(\theta)$ yields a unique source distribution, if it exists. To implement it, they relax the functional equality constraint with a penalty term, leading to the unconstrained problem

$$\max \left \{ \lambda H(q) - (1-\lambda) \log(D(q^{\#}, p_o)^2) \right \},$$

where $D(q^{\#}, p_o)$ measures the discrepancy between the push-forward and the empirical distributions, and $\lambda$ controls the penalty strength. The authors propose to use the Sliced-Wasserstein distance, as it is sample-based and circumvents the direct evaluation of the likelihood. The logarithmic term enhances numerical stability.

Incorporating the Bayesian perspective, with prior information about the source, the authors substitute the entropy term $H(q)$ with the Kullback-Leibler (KL) divergence between the estimated source distribution $q(\theta)$ and the initial prior $p(\theta).$ Doing so can be seen as regularizing the solution to stay close to the prior. Since the KL divergence includes the entropy $H(q)$ and the cross entropy $H(q,p),$ this new formulation remains amenable to sample-based estimation and the initial intention to include a large portion of the parameter space. The optimization thus becomes a balance between the KL divergence and the discrepancy measure:

\begin{align} & \lambda D_{\text{KL}}(q \Vert p) + (1 - \lambda) \log \left( D(q^{\#},p_o)^2 \right) \\ = - & \lambda H(q) + \lambda H(q,p) + (1 - \lambda) \log \left( D(q^{\#},p_o)^2\right). \end{align}

In the second line, the KL-divergence is expressed in terms of entropy and cross-entropy between the source and the prior distribution.

To approximate the source distribution $q(\theta),$ the authors utilize unconstrained artificial neural networks, as presented by [Van21N].

Numerical Experiments

Figure 3. [Vet24S],Figure 4. Comparison of the true and estimated source distribution (left) and the observed data against the push-forward distribution (right).

The authors validate their method through detailed numerical examples. First, they benchmark their approach against the two moons, the inverse kinematics (IK), and the simple likelihood complex posterior task (SCLP). All three presented by [Van21N] specifically for empirical Bayes. They further demonstrate their algorithm’s effectiveness on complex scenarios using differentiable simulators, specifically the Lotka-Volterra and SIR models, showcasing the method’s adaptability and strength in diverse simulation contexts. Finally, the authors apply Sourcer to the Hodgkin Huxley model, a well-known neuron model, to estimate the source distribution of the model’s parameters. In all cases, they use the Classifier-2-Sample-Test to evaluate the quality of push-forward distributions, obtained using the estimated source (Figure 2).

The numerical experiments showcase the method’s ability to accurately estimate a source distribution, that yields a push-forward distribution very close to the observed data. Furthermore, the source distribution show a greater level of entropy, as desired (Figure 2).