Probabilistic Models

Uncertainty permeates all aspects of real-world agency: Perception is subject to uncertainty owing to partial observability and unreliable sensors; the effects of an agent’s own actions may have non-determinstic effects; and even the tasks an agent is given may be subject to ambiguity or incomplete specification. Probability theory is a mathematical framework for the conceptualisation of uncertainty, and models built upon this framework are thus frequently viewed as indispensible components of AI systems that are to act successfully in real-world domains. This series covers recent advancements in the field of probabilistic models and, more generally, uncertainty in AI.

Graphical Models

Probabilistic models have been at the heart of many successful AI applications. From the 1980s onwards, probabilistic graphical models were very successfully applied in expert systems, decision support systems, and causal world models in general. The key principle of graphical models like Bayesian networks and Markov random fields is to compactly represent a full-joint probability distribution over a given set of random variables, exploiting marginal and/or conditional independencies between the variables, which are made explicit in a graphical structure in which every variable is represented as a node [Pea88P]. In turn, inference algorithms exploit the given structure in order to render computations tractable.

The representation of a full-joint probability distribution over a set of variables $X$ enables a wide range of reasoning patterns, supporting causal, diagnostic and mixed inference alike. If we denote by $E \subset X$ an indexed set of observed evidence variables and denote by $Q \subset X \setminus E$ a set of query variables, then with $U = X \setminus E$, canonical inference problems can be described as follows:

  • belief update: computing the posterior marginals of individual query variables after having observed (new) evidence, i.e. $$P(Q_i \mid E=e)$$
  • most probable explanation (MPE): computing the most probable assignment of all unobserved variables given the evidence, i.e. $$ \arg\max_u P(U=u \mid E=e)$$
  • maximum a posteriori inference (MAP): computing the most probable assignment of a subset of the unobserved variables, i.e. $$\arg\max_q P(Q=q \mid E=e).$$

Models representing joint distributions support the sampling of data from the model and therefore are called generative models.

The problem of learning probabilistic graphical models from data is frequently restricted to the parameter learning problem – with the graphical structure being given by a domain expert, as the corresponding structure learning problem is particularly challenging in practice.

Temporal/Sequence Models

For sequential reasoning problems, specialisations of graphical models such as dynamic Bayesian networks (DBNs) and the more restrictive hidden Markov models (HMMs) were developed [Rus10A]. HMMs link observations $E$ to (unobservable) hidden states $X$, the sequence of states being subject to a first-order Markovian assumption. Using specialised inference algorithms that exploit the temporal chain structure in these models, they were very successfully applied to problems such as state estimation, speech recognition and human activity recognition. Canonical inference problems include:

  • filtering: computing the distribution of the current state given the series of (past) observations, i.e. $P(X_i \mid E_1=e_1, \dots, E_i=e_i)$
  • smoothing: computing the distribution of all states (individually), given all $n$ observations (past and future relative to each state), i.e. $P(X_i \mid E_1=e_1, \dots, E_n=e_n)$
  • most probable explanation: computing the most probable sequence of hidden states given the sequence of observation, i.e. $\arg\max_x P(X=x \mid E=e)$.

Whereas the aforementioned approaches are specialisations of Bayesian networks and thus use directed graphical structures, undirected conditional random fields (CRFs) were introduced in order to support discriminative training [Laf01C], i.e. to enable models to represent precisely the conditional distribution that is of interest for a particular application (yielding a discriminative model) rather than a full-joint distribution.


In robotics, probabilistic methods were very successfully applied throughout the 1990s and 2000s in order to handle problems such as self-localization, state estimation, environment mapping, and control [Thr05P]. Probabilistic approaches allowed uncertainties pertaining to sensor inaccuracy, actuator behaviour and the agents’ environments to be soundly integrated for the first time.

First-Order Probabilistic Languages

In the late 1990s and 2000s, many probabilistic representation formalisms were developed that sought to lift the limitations of graphical models to support reasoning about a fixed set of random variables (and therefore a limited set of entities) only. In a field that emerged as statistical relational learning (SRL), the principles of expressive relational languages, specifically first-order logic, were transferred to probabilistic models [Get07I]. The key idea was to represent general (probabilistic and logical) principles about certain types of objects, which could then be materialised for an arbitrary set of concrete objects, yielding a ground model which was typically represented as a graphical model. The respective models could thus be thought of as meta-models that represent templates for the construction of graphical models. Canonical applications of SRL include social network modelling, link prediction, collective classification, collaborative filtering and entity resolution.

Tractable Models

Whereas SRL focussed on generality, in the 2010s, the focus of the field shifted towards tractability. Unfortunately, probabilistic inference is #P-hard and therefore intractable in the general case. However, there exist tractable subclasses of models that are still sufficiently expressive to be of high practical relevance. Probabilistic models with polynomial-time exact inference guarantees have thus been investigated, with the goal of enabling probabilistic techniques to scale to domains with thousands of interdependent variables. Specific approaches towards tractability include restricted model parametrisations, new representational frameworks like sum-product networks (SPNs) [Poo11S] or probabilistic circuits [Cho20P], and compositional architectures that achieve efficient inference through model modularity.

Probabilistic Programming

Over the past decade, probabilistic programming has rapidly grown as a breakthrough approach for easily specifying and performing inference on complex probabilistic models. The origins of probabilistic programming trace back to work in the late 2000s on Church, a LISP-based language for expressing probabilistic models through code. Today, probabilistic programming languages offer intuitive and flexible model specification syntaxes that lower the barriers for applications of probabilistic methods [Van21I]. As probabilistic techniques spread more broadly across science and industry, probabilistic programming promises to further that adoption by making modeling more accessible to analysts.

Research feed

Check all of our work