It is widely known that neural networks with a single hidden layer are universal approximators of continuous functions. However, a less known but powerful result is that neural networks can also accurately approximate function operators, in other words, mappings from a space of functions to another space of functions.

In a seminal paper by Lu et al. [Lu21L], the authors propose a neural network architecture that is capable of learning arbitrary nonlinear continuous operators with small generalization error, namely the deep operator network (DeepONet). The proposal of approximating operators with neural networks goes beyond universal function approximation, and is significant as DeepONets can “learn to solve” a problem. It has the potential to dramatically speed up the solution of, for example, parameterised differential equations.

As a reference, consider the (differential) equation of the form $$ L s = u $$ where $L$ is an arbitrary function operator that maps function $s$ to function $u$. If, for given $u$, we want to find the solution $s$ of this equation, an interesting but non-trivial operator is the solution operator $$ L^{-1}: u \mapsto s $$ that maps $u$ to the solution $s$ of the equation. In fact, DeepONets are able to learn this implicit operator $L^{-1}$.

This is a paradigm shift in the context of physics-informed neural networks, because solving a differential equation does not correspond to training a network, but evaluating the solution operator of the equation instead. Given a representation of the input function $u$ (for instance, by evaluating at pre-defined positions), a forward pass through the operator network returns a representation of the solution $s$.

The DeepONet architecture is based on the universal approximation theorem for operators [Che95U], which is suggestive of the structure and potential of deep neural networks in learning continuous operators. Similar to the well-known universal function approximation theorem, the corresponding theorem for operators (Theorem 1 in [Lu21L]) states that two fully connected neural networks with a single hidden layer, combined by a vector dot product of the outputs, are able to approximate any continuous nonlinear operator with arbitrary accuracy if the layers are sufficiently large.

Inspired directly by this theoretical result, the DeepONet architecture
(Figure 1) is made of two neural networks: A *branch* network encodes
the discrete input function space and a *trunk* network encodes the domain of
the output functions. In general, DeepONets can be constructed by
choosing functions $\mathbf{g}: \mathbb{R}^m \to \mathbb{R}^p$
and $\mathbf{f}: \mathbb{R}^d \to \mathbb{R}^p$ from diverse classes of
neural networks which satisfy the classical universal approximation theorem
of functions. A generalized version of the approximation theorem for operators
(Theorem 2 in [Lu21L]) states that for any nonlinear
continuous operator $G: u \mapsto G(u)$ and any $\epsilon > 0$ the inequality

$$ \begin{equation} \Bigg\lvert G(u)(y) - \big\langle \mathbf{g}\left(u(x_1),u(x_2), \dots, u(x_m)\right), \mathbf{f}(y) \big\rangle \Bigg\rvert < \epsilon \end{equation} $$

holds for all viable input functions $u$ and $y \in \mathbb{R}^d$, where $x_1, x_2, \dots, x_m$ are sufficient evaluation points of $u$ and $\langle\cdot, \cdot\rangle$ denotes the dot product in $\mathbb{R}^p$. Note that the input function $u$ is represented by $m$ function evaluations, but it could also be projected to a finite set of basis function coefficients.

In the paper, several examples demonstrate that DeepONets can learn various explicit operators, such as integrals and fractional Laplacians, as well as implicit operators that represent deterministic and stochastic differential equations. This shows the reliable application of DeepONets to learning a wide range of function operators.

Moreover, DeepONets can also be trained in the spirit of physics-informed neural networks, where a partial differential equation (PDE) is incorporated within the loss function. It is reported that predicting the solution of various types of parametric PDEs is up to three orders of magnitude faster compared to conventional PDE solvers [Wan21L].

Although the DeepONet architecture is sufficient for learning any operator, unfortunately, the universal approximation theorem does not inform us on how to learn these operators efficiently. In practice, carefully constructed network architectures will probably be more efficient for learning specific problems, and consequently, subsequent works propose improvements to the DeepONet architecture, for instance, the generalized class of neural operators [Kov23N].