SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds

A simple yet effective method for training continuous and discrete normalizing flows on distributions and point clouds where the support manifold has lower dimension than the input space.

Normalizing flows are versatile density estimators and currently one of the top-performing model types for this task. However, it has been observed that normalizing flows often struggle to accurately estimate the density in areas where the support of the data distribution is thin. The paper [Kim20S] attributes this phenomenon to the inability of normalizing flows to accurately model distributions with lower-dimensional support. The authors propose a simple yet effective method for training continuous and discrete normalizing flows on distributions and point clouds where the support manifold has a lower dimension than the input space.

Normalizing Flows on lower-dimensional manifolds

Normalizing flows are designed to be diffeomorphisms in $\mathbb{R}^d$ where $d$ is the dimension of the input space. However, many real-world distributions are supported on lower-dimensional manifolds. For example, in simple data such as MNIST, the lower dimensionality of the support can easily be seen by the fact that the corners of the image are always black. In more complex data, the support manifold is not as obvious, but lower-dimensional structures are still very likely to exist.

Figure 1: Illustration of a normalizing flow trained on 2D data (top) and a 1D manifold (bottom). Yet, normalizing flows are trained as if the support manifold was the full input space. As a consequence, a distribution of full support is trained to approximate a distribution with lower-dimensional support. As such an approximation becomes tighter, the Jacobian determinant must necessarily explode, which leads to numerical instabilities. Indeed, a manifold of lower dimension has Lebesgue measure $0$ and therefore the approximated densities at the true support manifold approach infinity, which has been further investigated in subsequent work, e.g. [Mcd22C]. As a consequence, the change of variables formula becomes invalid in the limit. The paper argues that many shortcomings of normalizing flows, as seen e.g. on various 2D benchmark datasets, can be attributed to this issue. While there has been previous work on normalizing flows on lower-dimensional manifolds, the authors argue that these methods either are not numerically stable or need prior knowledge such as the dimension of the manifold.


In this paper, the authors propose a simple yet effective method to train normalizing flows on distributions and point clouds where the support manifold has a lower dimension than the input space. The key idea is to train an amortized conditional flow on the data distribution perturbed with Gaussian noise conditioned on the noise level. More formally, the flow is trained to approximate the following distribution: $ p(Y | \sigma) = p(X + Z)$, where $Z \sim \mathcal{N}(0, \sigma^2I)$ and $X$ is distributed according to the data distribution $p_D$. Note that by perturbing the data with Gaussian noise, the support of the distribution is extended to the full input space. Since the flow is conditioned on the noise level $\sigma$ and trained on noise levels between $0$ and a given $\sigma_{\text{max}} > 0$, the flow learns to recover a tight approximation of the original distribution by setting $\sigma=0$ during inference.

Figure 2: The SoftFlow architecture.

The authors show that the amortized conditional flow $f_\theta(x | \sigma)$ from the perturbed data distribution to the base distribution can be trained with a simple modification of the standard maximum log-likelihood objective: \begin{align*} \mathcal{L}(\theta) &= \mathbb{E}_{X,\sigma,Z} \left[\log p_\theta(X+Z | \sigma)\right] \end{align*} where $\log p_\theta(X+Z | \sigma)$ can be written as $\log p_B(f(x+z | \sigma)) \left|\det\frac{\partial f(x | \sigma)}{\partial x}\right|$ The authors also give the dynamics for the case of a continuous flow, which directly extends the discrete flow setting. \begin{align*} \frac{\partial x_t}{\partial t} &= f(x_t, t, \sigma) \\ x_0 &\sim p_B \\ x_1 &\sim X + Z \\ Z &\sim \mathcal{N}(0, \sigma^2I) \end{align*}


Figure 3: The SoftPointFlow architecture. 3D point clouds are compact representations of the geometric details of objects. Their increasing popularity is also motivated by the fact that they can be easily acquired by range scanning devices such as LiDARs. PointFlow is a popular architecture for point cloud generation. Apparent problems with modeling thin structures, however, lead the authors of [Kim20S] to conjecture that the support of the point cloud distribution is lower-dimensional than the input space. The authors propose SoftPointFlow, a simple modification of PointFlow that uses the SoftFlow architecture to model the support manifold. SoftPointFlow is a variational auto-encoder-like architecture, where an encoder computes the posterior of the latent shape variable $S$ given the point cloud $X$, $p_\phi(S | X)$, and a decoder, implemented as SoftFlow computes the likelihood of the point cloud given the latent shape variable, $p_\theta(X | S)$. Additionally, the prior of $S$ is modeled by a PriorFlow, $p_\psi(S)$. The model is trained by maximizing the following ELBO objective: \begin{align*} &L(X_{\text{set}}; \theta, \psi, \phi) = \\ &\mathbb{E}_{q_{\phi}(S|X_{\text{set}})}\left[\log \frac{p_{\theta}(X_{\text{set}}|S, \sigma)p_{\psi}(S)}{ q_{\phi}(S|X_{\text{set}})}\right] \end{align*}

Experimental results

Figure 4: Samples from SoftFlow, Glow, and FFJORD trained on 5 different distributions. The experiments validated the proposed framework where SoftFlow was implemented within the FFJORD architecture, augmented with a noise distribution parameter. SoftFlow and FFJORD, and a 100-layer Glow model, were trained on data from five different distributions. The results indicate that Glow performed poorly across most distributions, and FFJORD also struggled, particularly with circular distributions. In contrast, SoftFlow effectively generated high-quality samples that closely followed the data distribution. Additional tests varying the noise distribution parameter demonstrated SoftFlow’s ability to accurately reflect different distributions, suggesting its potential for estimating unseen distributions or creating synthetic ones.

Figure 5: Examples of point clouds generated by PointFlow and SoftPointFlow. From left to right: reconstructed samples of seen data, reconstructed samples of unseen data, and synthetic samples.

Figure 6: Generation results on $1$-NNA (%). Lower is better The experiments on point clouds used the ShapeNet Core dataset to evaluate a framework for 3D point clouds, focusing on three categories: airplanes, chairs, and cars. SoftPointFlow, trained for 15K epochs on four 2080-Ti GPUs, was compared with PointFlow. SoftPointFlow, built on discrete normalizing flow networks, demonstrated superior performance in capturing fine details of objects, producing high-quality samples compared to the blurrier outputs of PointFlow. This was particularly evident when varying the standard deviation of the latent variables. SoftPointFlow maintained structural integrity better than PointFlow. SoftPointFlow also achieved significantly better results than GAN-based models in 1-nearest neighbor accuracy tests across different categories and was preferred by participants in a preference test for its quality and similarity to reference point clouds. The framework proved particularly effective for modeling generative flows on point clouds.


Because of the simplicity of the procedure, it is relatively easy to adopt SoftFlow training within existing pipelines. We have implemented the procedure in our VeriFlow project in the context of neuro-symbolic verification and can confirm that the method provides a significant improvement in the quality of the generated samples.


In this series