It is widely known that neural networks with a single hidden layer are universal approximators of continuous functions. However, a less known but powerful result is that neural networks can also accurately approximate function operators, in other words, mappings from a space of functions to another space of functions.
In a seminal paper by Lu et al. [Lu21L], the authors propose a neural network architecture that is capable of learning arbitrary nonlinear continuous operators with small generalization error, namely the deep operator network (DeepONet). The proposal of approximating operators with neural networks goes beyond universal function approximation, and is significant as DeepONets can “learn to solve” a problem. It has the potential to dramatically speed up the solution of, for example, parameterised differential equations.
As a reference, consider the (differential) equation of the form $$ L s = u $$ where $L$ is an arbitrary function operator that maps function $s$ to function $u$. If, for given $u$, we want to find the solution $s$ of this equation, an interesting but non-trivial operator is the solution operator $$ L^{-1}: u \mapsto s $$ that maps $u$ to the solution $s$ of the equation. In fact, DeepONets are able to learn this implicit operator $L^{-1}$.
This is a paradigm shift in the context of physics-informed neural networks, because solving a differential equation does not correspond to training a network, but evaluating the solution operator of the equation instead. Given a representation of the input function $u$ (for instance, by evaluating at pre-defined positions), a forward pass through the operator network returns a representation of the solution $s$.
The DeepONet architecture is based on the universal approximation theorem for operators [Che95U], which is suggestive of the structure and potential of deep neural networks in learning continuous operators. Similar to the well-known universal function approximation theorem, the corresponding theorem for operators (Theorem 1 in [Lu21L]) states that two fully connected neural networks with a single hidden layer, combined by a vector dot product of the outputs, are able to approximate any continuous nonlinear operator with arbitrary accuracy if the layers are sufficiently large.
Inspired directly by this theoretical result, the DeepONet architecture (Figure 1) is made of two neural networks: A branch network encodes the discrete input function space and a trunk network encodes the domain of the output functions. In general, DeepONets can be constructed by choosing functions $\mathbf{g}: \mathbb{R}^m \to \mathbb{R}^p$ and $\mathbf{f}: \mathbb{R}^d \to \mathbb{R}^p$ from diverse classes of neural networks which satisfy the classical universal approximation theorem of functions. A generalized version of the approximation theorem for operators (Theorem 2 in [Lu21L]) states that for any nonlinear continuous operator $G: u \mapsto G(u)$ and any $\epsilon > 0$ the inequality
$$ \begin{equation} \Bigg\lvert G(u)(y) - \big\langle \mathbf{g}\left(u(x_1),u(x_2), \dots, u(x_m)\right), \mathbf{f}(y) \big\rangle \Bigg\rvert < \epsilon \end{equation} $$
holds for all viable input functions $u$ and $y \in \mathbb{R}^d$, where $x_1, x_2, \dots, x_m$ are sufficient evaluation points of $u$ and $\langle\cdot, \cdot\rangle$ denotes the dot product in $\mathbb{R}^p$. Note that the input function $u$ is represented by $m$ function evaluations, but it could also be projected to a finite set of basis function coefficients.
In the paper, several examples demonstrate that DeepONets can learn various explicit operators, such as integrals and fractional Laplacians, as well as implicit operators that represent deterministic and stochastic differential equations. This shows the reliable application of DeepONets to learning a wide range of function operators.
Moreover, DeepONets can also be trained in the spirit of physics-informed neural networks, where a partial differential equation (PDE) is incorporated within the loss function. It is reported that predicting the solution of various types of parametric PDEs is up to three orders of magnitude faster compared to conventional PDE solvers [Wan21L].
Although the DeepONet architecture is sufficient for learning any operator, unfortunately, the universal approximation theorem does not inform us on how to learn these operators efficiently. In practice, carefully constructed network architectures will probably be more efficient for learning specific problems, and consequently, subsequent works propose improvements to the DeepONet architecture. Other architectures include, for instance, the general class of Neural Operators [Kov23N].
If you are interested in trying out DeepONets and other neural operators in a generalized framework, check out continuiti, our Python package for learning function operators with neural networks.