XML Preliminaries

Kacper Sokol

Brief History of Explainability

Interest in ML explainability

Expert Systems (1970s & 1980s)

Depiction of expert systems

Transparent Machine Learning Models

Decision tree

Rule list

Rise of the Dark Side (Deep Neural Networks)

Deep neural network

  • No need to engineer features (by hand)
  • High predictive power
  • Black-box modelling

DARPA’s XAI Concept

DARPA's XAI concept

Why We Need Explainability

Expectations Mismatch

We tend to request explanations mostly when a phenomenon disagrees with our expectations.

For example, an ML model behaves unlike we envisaged and outputs an unexpected prediction.


XAI stakeholders

Purpose or Role

  • Fairness
  • Privacy
  • Reliability and Robustness
  • Causality
  • Trust
  • Trustworthiness / Reliability / Robustness / Causality

    No silly mistakes & socially acceptable

  • Fairness

    Does not discriminate & is not biased


  • New knowledge

    Aids in scientific discovery

  • Legislation

    Does not break the law

    • EU’s General Data Protection Regulation
    • California Consumer Privacy Act
  • Debugging / Auditing

    Identify modelling errors and mistakes

  • Human–AI co-operation

    Help humans complete tasks


  • Safety / Security

    Abuse transparency to steal a (proprietary) model

  • Manipulation

    Use transparency to game a system, e.g., credit scoring


Copy machine study done by Langer, Blank, and Chanowitz (1978):

The copy machine study

Explanation Types

Explainability Source

  • ante-hoc – intrinsically transparent predictive models (transparency by design)
  • post-hoc – derived from a pre-existing predictive models that may themselves be unintelligible (usually requires an additional explanatory modelling step)

Explanation Provenance

    Ante-hoc does not necessarily entail explainable or human-understandable

  • endogenous explanation – based on human-comprehensible concepts operated on by a transparent model
  • exogenous explanation – based on human-comprehensible concepts constructed outside of the predictive model (by the additional modelling step)

Explanation Domain

Original domain
Original domain

Transformed domain
Transformed domain

Explanation Types

  • model-based – derived from model internals
  • feature-based – regarding importance or influence of data features
  • instance-based – carried by rael or fictitious data point

  • meta-explainers – one of the above, but not extracted directly from the predictive model being explained (using an additional explainability modelling step, e.g., surrogate)

Explanation Family

  • associations between antecedent and consequent
  • feature importance
  • feature attribution / influence
  • rules
  • exemplars (prototypes & criticisms)

Explanation Family    

  • contrasts and differences

    • (non-causal) counterfactuals
      i.e., contrastive statements
    • prototypes & criticisms

Explanation Family    

  • causal mechanisms

    • causal counterfactuals
    • causal chains
    • full causal model

Explanatory Medium

  • (statistical / numerical) summarisation
  • visualisation
  • textualisation
  • formal argumentation

Communication of Explanations

  • Static artefact

  • Interactive (explanatory) protocol

    • interactive interface
    • interactive explanation

Explainability Scope

global cohort local
data a set of data a subset of data an instance
model model space model subspace a point in model space
prediction a set of predictions a subset of predictions a individual prediction

  • algorithmic explanation – the learning algorithm, not the resulting model; e.g., modelling assumptions, caveats, compatible data types, etc.

Explainability Target

  • Focused on a single class (technically limited)

    • implicit context

      Why \(A\)? (…and not anything else, i.e., \(B \cup C \cup \ldots\))

    • explicit context

      Why \(A\) and not \(B\)?

  • Multi-class explainability (Sokol and Flach 2020b)

    If 🌧️, then \(A\); else if ☀️ & 🥶, then \(B\), else ☀️ & 🥵, then \(C\).

Important Developments

Where Is the Human? (circa 2017)

Insights from social sciences Insights from social sciences

Humans and Explanations

  • Human-centred perspective on explainability
  • Infusion of explainability insights from social sciences
    • Interactive dialogue (bi-directional explanatory process)
    • Contrastive statements (e.g., counterfactual explanations)

Exploding Complexity (2019)

Ante-hoc explainability Ante-hoc explainability

Ante-hoc vs. Post-hoc

Ante-hoc vs. post-hoc explainability

Black Box + Post-hoc Explainer

  1. Chose a well-performing black-box model
  2. Use explainer that is
    • post-hoc (can be retrofitted into pre-existing predictors)
    • and possibly model-agnostic (works with any black box)

Silver bullet

Caveat: The No Free Lunch Theorem

Silver bullet

Post-hoc explainers have poor fidelity

  • Explainability needs a process similar to KDD, CRISP-DM or BigData
    Data process
  • Focus on engineering informative features and inherently transparent models

It requires effort

XAI process

XAI process A generic eXplainable Artificial Intelligence process is beyond our reach at the moment

  • XAI Taxonomy spanning social and technical desiderata:
    • Functional • Operational • Usability • Safety • Validation •
    (Sokol and Flach, 2020. Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches)

  • Framework for black-box explainers
    (Henin and Le Métayer, 2019. Towards a generic framework for black-box explanations of algorithmic decision systems)
    XAI process

Examples of Explanations

Permutation Feature Importance


Individual Conditional Expectation & Partial Dependence


FACE Counterfactuals




Useful Resources

📖   Books

📝   Papers

💽   Software

Wrap Up


  • The landscape of explainability is fast-paced and complex
  • Don’t expect universal solution
  • The involvement of humans – as explainees – makes it all the more complicated


Belle, Vaishak, and Ioannis Papantonis. 2021. “Principles and Practice of Explainable Machine Learning.” Frontiers in Big Data, 39.
Doshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv Preprint arXiv:1702.08608.
Guidotti, Riccardo, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. “A Survey of Methods for Explaining Black Box Models.” ACM Computing Surveys (CSUR) 51 (5): 1–42.
Langer, Ellen J, Arthur Blank, and Benzion Chanowitz. 1978. “The Mindlessness of Ostensibly Thoughtful Action: The Role of ‘Placebic’ Information in Interpersonal Interaction.” Journal of Personality and Social Psychology 36 (6): 635.
Miller, Tim. 2019. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” Artificial Intelligence 267: 1–38.
Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.
Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.
Sokol, Kacper, and Peter Flach. 2020a. “Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 56–67.
———. 2020b. LIMEtree: Consistent and Faithful Surrogate Explanations of Multiple Classes.” arXiv Preprint arXiv:2005.01427.
———. 2021. “Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence.” arXiv Preprint arXiv:2112.14466.