Formal verification under uncertainty

Being able to ensure that AI systems behave as intended is essential for their adoption in many areas. This series covers formal methods for the verification of safety, fairness, and other relevant properties of (probabilistic) ML systems. We document the progress in research domains such as neuro-symbolic verification, probabilistic model-checking, and weighted model integration.

Formal Verification

Formal verification within the domain of artificial intelligence is an essential discipline focusing on the reliable, safe, and equitable operation of AI systems. This interdisciplinary field intersects with domains such as probabilistic modeling, neural network analysis, and algorithmic safety mechanisms. A concise overview of its primary aspects is provided below:

Probabilistic Model Checking: This method involves calculating the likelihood of specific outcomes within AI models [Kat16P], including Markov chains [Bai03M], Bayesian networks [Jan22P], probabilistic programs [Bat23P], or Markov decision processes [Har23P]. Through the assessment of these probabilities, researchers are able to predict and verify the behavior of AI systems under conditions of uncertainty. Fault tree analysis is commonly employed in this context to identify potential system failures [Vol18F, Jun16U].

Neural Network Verification: This procedure verifies whether a neural network model conforms to specified logical conditions. Neuro-symbolic approaches are at the forefront in this area, utilizing reference networks that encapsulate high-level concepts to enhance specification compliance [Xie22N].

Shielding: This strategy involves training an AI agent without constraints followed by the implementation of a safe fallback policy. This policy is activated when the agent’s proposed action is identified as potentially harmful, thus ensuring operational safety without compromising the learning process [Xia21B].

Methods and Applications

From a methodological perspective, formal verification in AI utilizes tools such as Satisfiability Modulo Theories (SMT) [Ehl17F], semi-definite programming [Faz20S, Kat17R], and weighted model integration [Mor21H] to analyze and ensure adherence to complex specifications. Abstract interpretation assists in understanding and verifying the properties of AI systems by approximating their behaviors.

In terms of applications, formal verification aims to enhance the robustness, safety, and fairness of AI systems. These objectives are crucial for the deployment of AI across various sectors, requiring dependable methods to ensure that AI behavior aligns with ethical and operational standards. A significant ongoing challenge in this field is the expansion of verification capabilities to encompass more complex systems and nuanced properties, as well as to increase the scalability of verification methods. This evolution underscores the need for innovative approaches to ensure that AI systems perform as intended, even in complex and unpredictable scenarios.


The state of software and competition in formal verification for AI is characterized by rapid evolution, demonstrating the community’s dedication to advancing the field. Competitions such as the Verification of Neural Networks Competition (VNNComp) [Mul22T] and QComp [Bud21C] serve as critical platforms for researchers and practitioners to test and benchmark their methods against both real-world and synthetic challenges. These competitions not only highlight progress but also promote innovation by presenting complex scenarios requiring novel solutions.

Among the software tools having significant impact, $\alpha$-$\beta$-CROWN is notable for its efficacy in neural network verification, providing tight and scalable bounds for neural network properties through abstract interpretation. Marabou offers another robust platform for neural network verification, based on SMT [Kat19M]. Caisar provides a unified interface that facilitates the use of the aforementioned frameworks and others, offering a comprehensive platform for neural network verification [Gir22C]. On the probabilistic side, Storm is distinguished for its thorough support for probabilistic model checking, capable of managing various models such as Markov chains and Markov decision processes with precision and efficiency [Hen22P].

These tools and competitions illustrate the dynamic interplay between theory and practice in formal verification. They not only highlight the current capabilities and achievements in ensuring AI system reliability and safety but also emphasize the ongoing need for innovative approaches to address the increasingly complex landscapes of AI applications. As the field expands, the continuous development and refinement of these tools, along with the insights gained from competitive benchmarking, are crucial for pushing the boundaries of verifiable properties, ensuring that AI systems are both powerful and trustworthy.

Check all of our work