Explainable AI

Large opaque models like neural networks require dedicated methods to study and interpret their behavior. In this series we review recent developments, analysing their relevance for business applications.

This series is dedicated to understanding the complexities of AI models, making them more transparent, and comprehensible to a wide range of audiences. Explainable AI (XAI) is a field focused on making the decisions and processes of AI systems understandable to humans. This is increasingly important due to the growing use of AI in critical domains and the need for accountability and trust in AI systems.

Overview of Explainable AI

Explainable AI refers to techniques and methods in the field of artificial intelligence that make the outputs of AI systems understandable and interpretable by humans. This is in contrast to the ‘black box’ nature of many AI models, where the decision-making process is opaque and difficult to interpret.

Government Regulations

The rise of AI applications in critical sectors has led to increased government interest and regulation. Many regions are now implementing guidelines and laws requiring AI systems to be explainable and transparent, especially when decisions impact human lives. This series will explore how XAI is shaping up to meet these regulatory requirements, offering both challenges and opportunities for AI developers and users.

Applications of Explainable AI

XAI plays a crucial role in enhancing trust, reliability, fairness, robustness, and resilience in various domains. By providing clear and understandable explanations of AI model decisions, XAI addresses several critical application areas [Kam21E, Lei23E]:

Building Trust: XAI fosters trust in AI systems among users and stakeholders by making the decision-making process transparent. In sectors like healthcare and finance, where decisions have significant impacts, understanding the rationale behind AI predictions is essential for user acceptance.
Ensuring Reliability: In critical applications such as autonomous driving or aerospace, XAI helps in verifying that AI models function as intended. By providing insights into model decisions, XAI allows engineers and developers to validate and improve the reliability of these systems.
Promoting Fairness: XAI is instrumental in identifying and mitigating biases in AI models. In areas like recruitment or loan approval, explainability ensures that decisions are made fairly, without unjust discrimination, by revealing how different factors contribute to the outcome.
Enhancing Robustness: XAI aids in detecting vulnerabilities or weaknesses in AI models. By understanding how different inputs affect predictions, developers can fortify models against adversarial attacks or unexpected input variations, enhancing their robustness.
Improving Resilience: In dynamic environments, XAI contributes to the resilience of AI systems by facilitating rapid adaptation and troubleshooting. For instance, in changing market conditions, XAI can help financial models adjust to new data patterns.
Regulatory Compliance: With increasing regulation around AI, XAI assists in meeting legal and ethical standards by providing auditable explanations of model decisions, essential for compliance in regulated industries.
Personalization in Services: In consumer-facing industries like retail or entertainment, XAI enables personalized services by explaining recommendations or choices, thereby enhancing user experience and engagement.
Research and Development: In scientific research, XAI helps in hypothesis generation and validation by uncovering new patterns or relationships within data, accelerating innovation and discovery.

By addressing these application areas, XAI not only improves the functionality and acceptance of AI systems but also ensures they align with ethical standards and societal values, making them more beneficial and acceptable to a wider audience.

Qualities of Explanations

In Explainable AI, the effectiveness of an explanation is gauged by several key qualities, each contributing to how well it meets audience needs [Mol22I]:

Accuracy: Measures the precision of predictions made by an explanation. High accuracy is crucial when the explanation is used for predictions. However, lower accuracy may be acceptable if it aligns with the model’s accuracy and aims to clarify the model’s behavior.
Fidelity: Indicates how closely an explanation reflects the actual behavior of the model it represents. High fidelity is vital for truly understanding and trusting the model, particularly in critical applications.
Comprehensibility: Concerns the ease with which the target audience can grasp the explanation. Influenced by the complexity of the explanation and the audience’s background knowledge, comprehensibility is crucial for user acceptance, trust, and collaborative interactions.
Certainty: Assesses how well the explanation conveys the model’s confidence in its predictions. This aspect is key for informed decision-making and risk assessment in dynamic or uncertain situations.

Balancing these qualities is essential for crafting explanations that are accurate, trustworthy, clear, and practically valuable in XAI.

Types of Explanations

In Explainable AI, various explanation types address different aspects of a model’s decision-making process [Mol22I]:

Feature Importance Explanations: Identify and rank features by their impact on the model’s predictions, crucial in areas like finance and healthcare for understanding feature relevance.
Example-Based Explanations: Use specific instances to demonstrate the model’s behavior, effective in fields where concrete examples are more illustrative, such as image or speech recognition.
Counterfactual Explanations: Explain what changes in inputs could lead to different outcomes, offering actionable insights for scenarios requiring understanding of outcome alterations.
Local Explanations: Focus on individual predictions to explain why the model made a certain decision, using tools like LIME and SHAP. They are key in applications needing in-depth insights into singular decisions, like patient diagnosis in healthcare.
Global Explanations: Provide an overarching view of the model’s behavior, elucidating general patterns and rules, essential for contexts requiring comprehensive understanding and transparency, such as policy-making.
Causal Explanations: Delve into cause-and-effect relationships within the decision process, vital for fields where understanding these dynamics is crucial, like scientific research and economics.

Each explanation type offers a distinct lens on the AI model’s decision-making, chosen based on the specific needs of the context, audience, and task.

Interpretability by design vs post-hoc methods

Within the world of explainable AI, two broad categories of methods are used to interpret AI models: intrinsically interpretable models and post-hoc blackbox methods. Intrinsically interpretable models derive their prediction via a transparent process the is naturally understandable to humans, while post-hoc blackbox methods interpret the predictions of opaque models after they have been trained. Blackbox predictions are made without revealing the decision-making process. They are often learned on the predictions of the blackbox model and might not be accurate [Rud19S]. Intrinsically interpretable models are therefore from the point of explainability conceptually in advantage since they guarantee to reflect the true decision making process of the model. However, one might need to design the model more specifically for the task at hand [Bel22I].

Intrinsically Interpretable Models

Intrinsically interpretable models are designed to be naturally understandable, sometimes sacrificing some level of expressivity for transparency (although whether this is truly necessary is a contentious point which has been disproved [Rud19S]). These models include decision trees, linear models, and rule-based systems, which provide insights into the decision-making process directly through their structure and the way they process data.

Key Features

Transparency: The model’s internal workings are understandable by human intuition.
Direct Interpretability: The decision-making process is clear without additional analysis or tools.

Classical Interpretable Models

Classical interpretable models in machine learning, known for their simplicity and transparency, include decision trees, linear models, and rule-based systems [Mol22I, Kam21E]:

Decision Trees: Tree-structured for classification and regression, splitting data based on criteria to form a prediction path. Advantages include ease of understanding, visualization, and handling diverse data types without needing data scaling. However, they can overfit and be unstable with data changes.
Linear Models: These models (like linear and logistic regression) predict outcomes based on linear combinations of input features, characterized by simplicity and ease of interpretation. Effective for linear relationships and computationally efficient, but limited in handling complex, non-linear relationships, outliers, or multicollinearity.
Rule-Based Systems: Employ human-readable ‘if-then’ rules for decision-making, with each rule specifying a condition leading to a conclusion. Highly interpretable and easy to update, these systems depend on the quality of the rules and can become complex with many rules, potentially struggling with generalization.

These models are vital in areas requiring clear insight into decision processes, such as healthcare and finance, offering a balance between predictive accuracy and interpretability.

Probabilistic Models

Probabilistic models in machine learning, known for their transparent approach and inherent interpretability, include graphical models like Bayesian networks and Markov models, as well as time series models such as SARIMA and Prophet:

Graphical Models [Win20M]: Utilize graph-based representations to depict conditional dependencies between variables, aiding in understanding complex relationships and data structures. Their visual nature enhances comprehensibility, and they effectively handle uncertainty and incomplete data. However, they can become complex with more variables and require solid domain knowledge for correct setup.
Time Series Models:
- SARIMA: Excels in forecasting time series data, accommodating both non-seasonal and seasonal components.
- Prophet [Tay17F]: Optimized for daily observations with patterns across different time scales, effective for data with strong seasonal effects. These models clarify temporal dynamics and forecast future events based on historical data. However, they demand substantial domain knowledge and may struggle with noisy or non-stationary data.

These probabilistic models are valuable for their deep insights into data structures and decision-making processes, especially in tasks requiring high interpretability.

Interpretable Deep Learning Models

Interpretable deep learning models effectively merge the predictive capabilities of neural networks with transparency, vital for applications where understanding decision-making is key. Notable models include:

Symbolic-Based Models: These models incorporate symbolic expressions within neural networks. This process entails developing a model aligned with the data, training it, and then integrating symbolic expressions to replace internal functions. They are highly interpretable, offering analytical parallels to the model’s predictions, but require accurate symbolic fitting and can be complex.
Interpretable Attention Mechanisms [Lim21T]: Attention mechanisms in models like Temporal Fusion Transformers (TFT) enhance interpretability in time series forecasting and other applications. They focus on important features or time steps in the data, providing insights into how different elements influence the model’s predictions. While offering clearer understanding of model decisions, they may introduce higher computational demands during training.
Prototype-Based Models (ProtoPNet and ProtoTreeNet) [Nau21T, Nau21N]: These models use representative features or ‘prototypes’ for decision-making, as seen in ProtoPNet for image classification, and organize these prototypes in a decision tree structure in ProtoTreeNet. They offer transparency by allowing comparisons between inputs and learned prototypes, combining the interpretability of decision trees with deep learning’s power. However, the complexity of these models can sometimes obscure understanding, particularly in more intricate tree structures. Additionally, the presented prototypes might be misleading as they are further processed by non-interpretable mechanisms down stream [Hof21T, Nau21T].

Overall, these models demonstrate significant progress in making deep learning more interpretable and user-friendly, crucial for areas requiring clear and transparent decision-making processes.

Post-Hoc Blackbox Methods

Post-hoc methods are used to interpret models that are inherently complex and opaque (‘blackbox’), like deep neural networks or ensemble methods. These techniques are applied after the model has been trained and include feature importance scores, partial dependence plots, and LIME (Local Interpretable Model-agnostic Explanations).

Key Features

Model Agnosticism: Applicable to any machine learning model.
Insightful Analysis: Provides a deeper understanding of model behavior, often through visualizations.
Reliance on proxies: Interpretations are often based on approximations of the model’s decision-making process, simpler representations, templates, or other proxies, making many of their claims to interpretability questionable [Rud19S].

Statistical Methods

Statistical post-hoc methods offer valuable insights into the relationship between features and a model’s output in AI, encompassing various techniques [Mol22I, Kam21E]:

Partial Dependence (PD) Plots: Show the average effect of a feature on the model’s prediction, giving a global view of feature importance and its impact on the model’s output. They are easy to understand and widely applicable, but assume feature independence, which may not reflect true model behavior in the presence of strong feature interactions.
Individual Conditional Expectation (ICE) Plots: Extend PD plots by detailing the relationship between a feature and the outcome for each instance. ICE plots offer a nuanced, instance-level view of model behavior, highlighting variations missed by PD plots. However, they can become cluttered with many instances, making interpretation challenging.
Accumulated Local Effects (ALE) Plots: Concentrate on local prediction changes, aggregating feature effects over small intervals. ALE plots are more accurate in cases of feature interactions than PD plots and are less computationally demanding. Yet, they may struggle with highly correlated features and can be complex for non-experts to interpret.

These methods collectively enhance the understanding of complex AI models by illuminating how changes in features influence predictions, each with its unique advantages and limitations.

Concept Based Methods

These methods build on the idea that deep learning models recognize high level while precessing an input. Concept based methods try to identify the recognized concepts and their influence on the model’s output.

Concept Activation Vectors (CAVs) [Kim18I]: These vectors are used to identify the concepts recognized by a deep learning model. They fit a linear classifier to the activations of a model’s hidden layers, identifying the concepts recognized by the model. However, their effectiveness depends on the number of quality of concepts labeled data at training time. Since curating concepts is a manual process, it can be time-consuming and subjective. Therefore, concept activation is often trained on a small subset of the data, which can lead to inaccurate explanations.
Concept bottleneck models [Koh20C]: explicitly structure the model to first predict human-understandable concepts from the input data, and then use these concepts to make the final prediction. This approach divides the prediction process into two stages, with the first stage focusing on identifying interpretable concepts that are directly used in the second stage to make predictions. The interpretability comes from the model’s architecture, which is designed to map inputs to concepts and then concepts to outputs.

Game Theoretic Methods

Game theoretic approaches in AI interpretability treat input features as players in a cooperative game, offering unique insights:

Shapley Values [Lun17U, Cov20U, Mer19E]: Derived from cooperative game theory, Shapley values distribute a model’s output among its features based on their contribution. This allocation provides a fair understanding of each feature’s impact on the model’s decision. However, they can be computationally heavy for models with many features and often assume feature independence, which may not always be accurate in complex real-world data, leading to possible misinterpretations [Kum20P, Tau23M].
Least Core [Yan21I]: This concept, also from cooperative game theory, examines the stability of the model (coalition) by identifying the minimal feature value change necessary to significantly alter the model’s output. It highlights sensitive features in the model. The application of the least core can be complex and computationally intensive in high-dimensional models. Like Shapley values, the least core may not fully account for feature interactions, potentially simplifying interpretability in models with interconnected features.

These game theoretic methods provide a framework for understanding and interpreting the contributions and sensitivities of features within AI models, each with its own set of challenges and computational considerations.

Our library for data valuation implements those and many other techniques for valuation of training points, but it can also be used for feature attribution

Gradient based methods

Saliency maps and Integrated Gradients are vital tools in AI for visualizing and understanding influential features in predictions, especially in image processing and deep learning:

Saliency Maps [Sim13D]: These are visual tools highlighting influential areas in inputs like images, indicating each pixel’s contribution to the final decision by computing the gradient of the output relative to the input. While saliency maps offer intuitive and direct visual interpretation of complex models such as deep neural networks, they can produce noisy or challenging-to-interpret results, particularly in highly abstract models or with complex inputs.
Integrated Gradients [Sun17A]: This technique enhances saliency maps by accumulating gradients from a baseline input to the actual input, providing a more comprehensive and detailed view of feature importance. Integrated Gradients offer consistent and detailed feature attribution, especially suited for deep learning models with non-linear behaviors. However, they are computationally intensive, require careful baseline selection for meaningful interpretations, and are most applicable to models with definable and informative gradients.

Both methods significantly contribute to the interpretability of complex AI models by providing a clearer understanding of feature influence in predictions.

Interpretable Surrogate Models

(Local) surrogate models, particularly LIME (Local Interpretable Model-agnostic Explanations [Rib16W]), are key in machine learning for interpreting complex models by simplifying their predictions:

LIME: Generates individual prediction explanations by forming a simpler model for each instance. It alters input data, observes prediction changes, and fits a simple model (like linear regression) to approximate the complex model’s specific behavior. Advantages include its model-agnostic nature, applicable to any machine learning model, and providing intuitive, feature-focused explanations. However, its local explanations may not always align with the complex model’s global behavior, and its effectiveness depends on the choice of the simpler model and perturbation strategy.

Training Data Attribution

Training Data Attribution in ML is the analysis of the impact of individual data points on model predictions. It can be useful for model debugging, fairness assessment, evaluating copyright protection claims, or anomaly detection.

You can check our library for data valuation for efficient implementations of many of these methods.

Influence Functions: [Koh17U, Bas20I, Fel20W, Fis23I, Fis23S] These assess how changes in training data, such as removing or altering a data point, affect a model’s output through an approximation process. This avoids the need for constant model retraining, thereby saving computational resources. The advantages include efficiency in estimating training data impact, insights into model behavior by identifying influential data points, enhancement of model robustness and fairness, and increased transparency by linking training data to model output. However, they also present challenges such as complexity in mathematical understanding, interpretation difficulties requiring a strong grasp of the model and data, and approximation limitations that may not fully capture the influence of a data point in complex models. Despite these challenges, influence functions are valuable for assessing and improving the reliability and fairness of machine learning models.

Explainable AI

Jun 7, 2023

Seminar

The debate on the accuracy-interpretability tradeoff

In this talk we look into the debate on the alleged trade-off between accuracy and interpretability. We will discuss the literature on the …

Explainable AI

Jun 1, 2023

Other series in Trustworthy and interpretable ML

Classifier calibration

For many applications of probabilistic classifiers it is important that the predicted confidence vectors reflect true probabilities (one …

Trustworthy and Interpretable ML

Simulation-Based Inference

Simulation-based inference (SBI) offers a powerful framework for Bayesian parameter estimation in intricate scientific simulations where …

Trustworthy and Interpretable ML

Uncertainty Quantification

Uncertainty quantification (UQ) in machine learning is the practice of measuring or estimating uncertainty in models. It is a set of tools …

Trustworthy and Interpretable ML

Probabilistic Models

Uncertainty permeates all aspects of real-world agency: Perception is subject to uncertainty owing to partial observability and unreliable …

Trustworthy and Interpretable ML

Check all of our work

References

[Mol22I]

Interpretable Machine Learning, Christoph Molnar.

2022

Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This book is about making machine learning models and their decisions interpretable. After exploring the concepts of interpretability, you will learn about simple, interpretable models such as decision …

[Kam21E]

Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning, Uday Kamath, John Liu.

2021

In recent years, we have seen gains in adoption of machine learning and artificial intelligence applications. However, continued adoption is being constrained by several limitations. The field of Explainable AI addresses one of the largest shortcomings of machine learning and deep learning algorithms today: the interpretability and explainability of models. As algorithms become more powerful and …

[Bas20I]

Influence Functions in Deep Learning Are Fragile, Samyadeep Basu, Phil Pope, Soheil Feizi.

Sep 2020

Influence functions approximate the effect of training samples in test-time predictions and have a wide variety of applications in machine learning interpretability and uncertainty estimation. A...

[Bel22I]

It’s Just Not That Simple: An Empirical Study of the Accuracy-Explainability Trade-off in Machine Learning for Public Policy, Andrew Bell, Ian Solano-Kamaiko, Oded Nov, Julia Stoyanovich.

Jun 2022

To achieve high accuracy in machine learning (ML) systems, practitioners often use complex “black-box” models that are not easily understood by humans. The opacity of such models has resulted in public concerns about their use in high-stakes contexts and given rise to two conflicting arguments about the nature — and even the existence — of the accuracy-explainability trade-off. One side postulates …

[Cov20U]

Understanding Global Feature Contributions With Additive Importance Measures, Ian Covert, Scott Lundberg, Su-In Lee.

Oct 2020

Understanding the inner workings of complex machine learning models is a long-standing problem and most recent research has focused on local interpretability. To assess the role of individual input features in a global sense, we explore the perspective of defining feature importance through the predictive power associated with each feature. We introduce two notions of predictive power (model-based …

[Fel20W]

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation, Vitaly Feldman, Chiyuan Zhang.

2020

Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical …

[Fis23I]

Influence Diagnostics under Self-concordance, Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid Harchaoui.

Apr 2023

Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and …

[Fis23S]

Statistical and Computational Guarantees for Influence Diagnostics, Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid Harchaoui.

Sep 2023

[Hof21T]

This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks, Adrian Hoffmann, Claudio Fanconi, Rahul Rade, Jonas Kohler.

Jun 2021

Deep neural networks that yield human interpretable decisions by architectural design have lately become an increasingly popular alternative to post hoc interpretation of traditional black-box models. Among these networks, the arguably most widespread approach is so-called prototype learning, where similarities to learned latent prototypes serve as the basis of classifying an unseen data point. In …

[Kim18I]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres.

Jun 2018

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of …

[Koh20C]

Concept Bottleneck Models, Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang.

Nov 2020

We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., …

[Koh17U]

Understanding Black-box Predictions via Influence Functions, Pang Wei Koh, Percy Liang.

Jul 2017

How can we explain the predictions of a black-box model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a …

[Kum20P]

Problems with Shapley-value-based explanations as feature importance measures, I. Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, Sorelle Friedler.

Nov 2020

Game-theoretic formulations of feature importance have become popular as a way to

[Lei23E]

Effects of Explainable Artificial Intelligence on trust and human behavior in a high-risk decision task, Benedikt Leichtmann, Christina Humer, Andreas Hinterreiter, Marc Streit, Martina Mara.

Feb 2023

Understanding the recommendations of an artificial intelligence (AI) based assistant for decision-making is especially important in high-risk tasks, such as deciding whether a mushroom is edible or poisonous. To foster user understanding and appropriate trust in such systems, we assessed the effects of explainable artificial intelligence (XAI) methods and an educational intervention on AI-assisted …

[Lim21T]

Temporal Fusion Transformers for interpretable multi-horizon time series forecasting, Bryan Lim, Sercan Ö. Arık, Nicolas Loeff, Tomas Pfister.

Oct 2021

Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed in the past – without any prior information on how they interact with the target. Several deep learning methods have been proposed, but they are typically ‘black-box’ models that do not shed light on how …

[Lun17U]

A Unified Approach to Interpreting Model Predictions, Scott M Lundberg, Su-In Lee.

2017

Understanding why a model makes a certain prediction can be as crucial as theprediction’s accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle tointerpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been …

[Mer19E]

The Explanation Game: Explaining Machine Learning Models Using Shapley Values, Luke Merrick, Ankur Taly.

Sep 2019

A number of techniques have been proposed to explain a machine learning model's prediction by attributing it to the corresponding input features. Popular among these are techniques that apply the Shapley value method from cooperative game theory. While existing papers focus on the axiomatic motivation of Shapley values, and efficient techniques for computing them, they offer little justification …

[Nau21N]

Neural Prototype Trees for Interpretable Fine-Grained Image Recognition, Meike Nauta, Ron van Bree, Christin Seifert.

2021

Prototype-based methods use interpretable representations to address the black-box nature of deep learning models, in contrast to post-hoc explanation methods that only approximate such models. We propose the Neural Prototype Tree (ProtoTree), an intrinsically interpretable deep learning method for fine-grained image recognition. ProtoTree combines prototype learning with decision trees, and thus …

[Nau21T]

This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition, Meike Nauta, Annemarie Jutte, Jesper Provoost, Christin Seifert.

2021

Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image “looks like” a prototype. However, perceptual similarity for humans can be different from the similarity learned by the classification model. Hence, only visualising prototypes can be insufficient for a user to understand what a …

[Rib16W]

"Why Should I Trust You?": Explaining the Predictions of Any Classifier, Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin.

Feb 2016

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy …

[Rud19S]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Cynthia Rudin.

May 2019

Black box machine learning models are currently being used for high-stakes decision making throughout society, causing problems in healthcare, criminal justice and other domains. Some people hope that creating methods for explaining these black box models will alleviate some of the problems, but trying to explain black box models, rather than creating models that are interpretable in the first …

[Sim13D]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, K. Simonyan, A. Vedaldi, Andrew Zisserman.

Dec 2013

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a …

Publication

[Sun17A]

Axiomatic Attribution for Deep Networks, Mukund Sundararajan, Ankur Taly, Qiqi Yan.

Jul 2017

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. …

[Tau23M]

Manifold Restricted Interventional Shapley Values, Muhammad Faaiz Taufiq, Patrick Blöbaum, Lenon Minorics.

Apr 2023

Shapley values are model-agnostic methods for explaining model predictions. Many commonly used methods of computing Shapley values, known as off-manifold methods, rely on model evaluations on out-of-distribution input samples. Consequently, explanations obtained are sensitive to model behaviour outside the data distribution, which may be irrelevant for all practical purposes. While on-manifold …

[Tay17F]

Forecasting at scale, Sean J. Taylor, Benjamin Letham.

Sep 2017

Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high quality forecasts — especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we …

[Win20M]

Model based machine learning, John Winn, Christopher M. Bishop, Thomas Diethe, John Guiver, Yordan Zaykov.

2020

Today, machine learning is being applied to a growing variety of problems in a bewildering variety of domains. When doing machine learning, a fundamental challenge is connecting the abstract mathematics of a particular machine learning technique to a concrete, real-world problem. This book tackles this challenge through model-based machine learning. Model-based machine learning is an approach …

[Yan21I]

If You Like Shapley Then You’ll Love the Core, Tom Yan, Ariel D. Procaccia.

May 2021

The prevalent approach to problems of credit assignment in machine learning — such as feature and data valuation— is to model the problem at hand as a cooperative game and apply the Shapley value. But cooperative game theory offers a rich menu of alternative solution concepts, which famously includes the core and its variants. Our goal is to challenge the machine learning community’s current …

Overview of Explainable AI

Government Regulations

Applications of Explainable AI

Qualities of Explanations

Types of Explanations

Interpretability by design vs post-hoc methods

Intrinsically Interpretable Models

Key Features

Classical Interpretable Models

Probabilistic Models

Interpretable Deep Learning Models

Post-Hoc Blackbox Methods

Key Features

Statistical Methods

Concept Based Methods

Game Theoretic Methods

Gradient based methods

Interpretable Surrogate Models

Training Data Attribution

Research feed

Interpreting CLIP's Image Representation via Text-based Decomposition

Interpreting the output of neural networks is often challenging because it entails putting into words patterns that may not be easily …

Scientific Inference With Interpretable Machine Learning

Timo will introduce a framework for designing interpretable machine learning methods for science, termed &ldquo;property descriptors&rdquo;.

Concept Activation Vectors

In the last seminar of our XAI series, Iván Rodríguez from appliedAI talks about Concept Activation Vectors (CAVs). CAVs go beyond feature …

An information-theoretic perspective on model interpretation

In the ninth seminar of our XAI series, Kristof Schröder, Senior Research Engineer at appliedAI, will discuss how maximizing mutual …

Effects of XAI on perception, trust and acceptance

This talk delves into the influence of Explainable Artificial Intelligence (XAI) on human cognition, trust, and acceptance of AI-driven …

Latent space prototype interpretability: Strengths and shortcomings

Prototype-based approaches aim at training intrinsically interpretable models that nevertheless are as powerful as typical black-box neural …

Influence Diagnostics Under Self-Concordance

In our sixth seminar, we have the pleasure of receiving Jillian Fisher from the statistics department of the University of Washington, who …

Manifold Restricted Interventional Shapley Values

For our fifth installment in this series we are happy to host Muhammad Faaiz Taufiq, from Oxford University. Faaiz will introduce his recent …

Influence functions and Data Pruning: from theory to non-convergence

Today’s session brings Influence Functions under the spotlight - the theory, non-convergence issues, and uses for data pruning. Fabio will …

Post-Hoc Concept Bottleneck Models

Concept Bottleneck models are a set of intrinsically interpretable models that rely on high level concepts to make predictions. They often …

Shapley values for XAI: the good, the bad and the ugly

In this talk Anes will ask questions like: What is the true significance of Shapley values as feature importance measures? How can Shapley …

The debate on the accuracy-interpretability tradeoff

In this talk we look into the debate on the alleged trade-off between accuracy and interpretability. We will discuss the literature on the …

Other series in Trustworthy and interpretable ML

Classifier calibration

For many applications of probabilistic classifiers it is important that the predicted confidence vectors reflect true probabilities (one …

Simulation-Based Inference

Simulation-based inference (SBI) offers a powerful framework for Bayesian parameter estimation in intricate scientific simulations where …

Uncertainty Quantification

Uncertainty quantification (UQ) in machine learning is the practice of measuring or estimating uncertainty in models. It is a set of tools …

Probabilistic Models

Uncertainty permeates all aspects of real-world agency: Perception is subject to uncertainty owing to partial observability and unreliable …

References

Timo will introduce a framework for designing interpretable machine learning methods for science, termed “property descriptors”.