# Graphical Models

Probabilistic models have been at the heart of many successful AI applications.
From the 1980s onwards, *probabilistic graphical models* were very successfully
applied in expert systems, decision support systems, and causal world models in
general.
The key principle of graphical models like *Bayesian networks* and
*Markov random fields* is to compactly represent a full-joint probability
distribution over a given set of random variables,
exploiting marginal and/or conditional independencies between the variables,
which are made explicit in a graphical structure in which every variable
is represented as a node [Pea88P].
In turn, inference algorithms exploit the given structure in order to render
computations tractable.

The representation of a full-joint probability distribution over a set of variables $X$ enables a wide range of reasoning patterns, supporting causal, diagnostic and mixed inference alike. If we denote by $E \subset X$ an indexed set of observed evidence variables and denote by $Q \subset X \setminus E$ a set of query variables, then with $U = X \setminus E$, canonical inference problems can be described as follows:

*belief update*: computing the posterior marginals of individual query variables after having observed (new) evidence, i.e. $$P(Q_i \mid E=e)$$*most probable explanation*(MPE): computing the most probable assignment of all unobserved variables given the evidence, i.e. $$ \arg\max_u P(U=u \mid E=e)$$*maximum a posteriori*inference (MAP): computing the most probable assignment of a subset of the unobserved variables, i.e. $$\arg\max_q P(Q=q \mid E=e).$$

Models representing joint distributions support the sampling of data from the
model and therefore are called *generative models*.

The problem of learning probabilistic graphical models from data is frequently
restricted to the *parameter learning* problem – with the graphical
structure being given by a domain expert, as the corresponding
*structure learning* problem is particularly challenging in practice.

# Temporal/Sequence Models

For sequential reasoning problems, specialisations of graphical models such
as *dynamic Bayesian networks* (DBNs) and the more restrictive *hidden Markov
models* (HMMs) were developed [Rus10A].
HMMs link observations $E$ to (unobservable) hidden states $X$,
the sequence of states being subject to a first-order Markovian assumption.
Using specialised inference algorithms that exploit the temporal chain
structure in these models, they were very successfully applied to problems such
as state estimation, speech recognition and human activity recognition.
Canonical inference problems include:

*filtering*: computing the distribution of the current state given the series of (past) observations, i.e. $P(X_i \mid E_1=e_1, \dots, E_i=e_i)$*smoothing*: computing the distribution of all states (individually), given all $n$ observations (past and future relative to each state), i.e. $P(X_i \mid E_1=e_1, \dots, E_n=e_n)$*most probable explanation*: computing the most probable sequence of hidden states given the sequence of observation, i.e. $\arg\max_x P(X=x \mid E=e)$.

Whereas the aforementioned approaches are specialisations of Bayesian networks
and thus use directed graphical structures, undirected
*conditional random fields* (CRFs) were introduced in order to
support *discriminative training* [Laf01C],
i.e. to enable models to represent precisely the conditional distribution that
is of interest for a particular application (yielding a *discriminative model*)
rather than a full-joint distribution.

# Robotics

In robotics, probabilistic methods were very successfully applied throughout the 1990s and 2000s in order to handle problems such as self-localization, state estimation, environment mapping, and control [Thr05P]. Probabilistic approaches allowed uncertainties pertaining to sensor inaccuracy, actuator behaviour and the agents’ environments to be soundly integrated for the first time.

# First-Order Probabilistic Languages

In the late 1990s and 2000s, many probabilistic representation formalisms were
developed that sought to lift the limitations of graphical models to support
reasoning about a fixed set of random variables (and therefore a limited set of
entities) only.
In a field that emerged as *statistical relational learning* (SRL),
the principles of expressive relational languages, specifically first-order
logic, were transferred to probabilistic models
[Get07I].
The key idea was to represent general (probabilistic and logical) principles
about certain types of objects, which could then be materialised for an
arbitrary set of concrete objects, yielding a ground model which was typically
represented as a graphical model.
The respective models could thus be thought of as meta-models that represent
templates for the construction of graphical models.
Canonical applications of SRL include social network modelling, link prediction,
collective classification, collaborative filtering and entity resolution.

# Tractable Models

Whereas SRL focussed on generality, in the 2010s, the focus of the field
shifted towards tractability.
Unfortunately, probabilistic inference is #P-hard and therefore intractable in
the general case.
However, there exist tractable subclasses of models that are still sufficiently
expressive to be of high practical relevance.
Probabilistic models with polynomial-time exact inference guarantees have
thus been investigated, with the goal of enabling probabilistic techniques to
scale to domains with thousands of interdependent variables.
Specific approaches towards tractability include restricted model
parametrisations, new representational frameworks like *sum-product networks*
(SPNs) [Poo11S] or *probabilistic circuits*
[Cho20P], and compositional architectures that
achieve efficient inference through model modularity.

# Probabilistic Programming

Over the past decade, probabilistic programming has rapidly grown as a breakthrough approach for easily specifying and performing inference on complex probabilistic models. The origins of probabilistic programming trace back to work in the late 2000s on Church, a LISP-based language for expressing probabilistic models through code. Today, probabilistic programming languages offer intuitive and flexible model specification syntaxes that lower the barriers for applications of probabilistic methods [Van21I]. As probabilistic techniques spread more broadly across science and industry, probabilistic programming promises to further that adoption by making modeling more accessible to analysts.