In some of our latest paper pills we have summarised the recent developments of denoising generative models (DGM), with particular emphasis on score-based techniques. In [Son21S] we have seen how DGM can be studied with the formalism of Stochastic differential equations, while [Doc22S] presents the recent state of the art in image generation.
In this pill we will be taking a step back and briefly go through some of the key publications that, in the past 7 years, have led to the success of DGMs.
In 2015, the seminal paper [Soh15D] showed that it is possible to generate samples (e.g. images or audio) by learning a variational decoder to reverse a discrete diffusion process that perturbs data with noise. The models trained with this type of technique were named de-noising diffusion probabilistic models (DDPM). Without awareness of this work, score-based generative models (SGM) were also being developed, motivated independently and through the use of a different mathematical formalism. In 2019, [Son19G] showed that the empirical performance of SGMs could rival that of other, widely acclaimed generative methods (GANs and VAEs).
At first glance, the connection between the SGM and DDPM seemed superficial, since the former is trained by score matching and sampled by Langevin dynamics, while the latter is trained by the evidence lower bound (ELBO) and sampled with a learned decoder. However, in 2020 the paper “Denoising Diffusion Probabilistic Models” [Ho20D] (of which both original code and a pytorch implementation are available) showed that the ELBO used for training diffusion probabilistic models is essentially equivalent to the weighted combination of score matching objectives used in score-based generative modeling.
Inspired by that work, the aforementioned [Son21S] further investigated the relationship between diffusion models and score-based generative models, and proved that not only the training process, but also the sampling method of DDPMs can be integrated with the annealed Langevin dynamics of score-based models. This creates a unified and more powerful sampler: the Predictor-Corrector sampler.
If you are interested in learning more about the history and development of de-noising generative models, I recommend the following blog posts: the first focuses on DDPM and goes through all the essential math. The second is centered around score-based methods and is a bit more high level, but it was written by one of the key authors (Yang Song) of the DGM revolution and presents some unique insights.
In summary, diffusion models are an exciting new direction for generative models that is based on rigorous mathematics and beautiful insights. It has quickly matured in recent years and, by now, looks ready to be deployed in great applications.