Machine Learning Explainability

Exploring Automated Decision-Making Through Transparent Modelling and Peeking Inside Black Boxes


Kacper Sokol

GitHub Release
Slides Licence Code Licence
DOI Cite BibTeX

View Slides

See the sidebar for an index of slides and demos.

Course Schedule

The course will be held over two weeks:

  • week 1 commencing on the 6th and
  • week 2 commencing on the 13th

of February 2023.

What When Where (week 1) Where (week 2)
lecture 9.30–10.15am D0.03 D1.14
discussion 10.15–10.30am D0.03 D1.14
lab 10.30–11.15am D0.03 D1.14
open office 11.30am–12pm D0.03 D1.14

Course Summary

Machine learning models require care, attention and a fair amount of tuning to offer accurate, consistent and robust predictive modelling of data. Why should their transparency and explainability be any different? While it is possible to easily generate explanatory insights with methods that are post-hoc and model-agnostic – LIME and SHAP, for example – these can be misleading when output by generic tools and viewed out of (technical or domain) context. Explanations should not be taken at their face value; instead their understanding ought to come from interpreting explanatory insights in view of the implicit caveats and limitations under which they were generated. After all, explainability algorithms are complex entities often built from multiple components that are subject to parameterisation choices and operational assumptions, all of which must be accounted for and configured to yield a truthful and useful explainer. Additionally, since any particular method may only provide partial information about the functioning of a predictive model, embracing diverse insights and appreciating their complementarity – as well as disagreements – can further enhance understanding of an algorithmic decision-making process.

This course takes an adversarial perspective on artificial intelligence explainability and machine learning interpretability. Instead of reviewing popular approaches used to these ends, it breaks them up into core functional blocks, studies the role and configuration thereof, and reassembles them to create bespoke, well-understood explainers suitable for the problem at hand. The course focuses predominantly on tabular data, with some excursions into image and text explainability whenever a method is agnostic of the data type. The tuition is complemented by a series of hands-on materials for self-study, which allow you to experiment with these techniques and appreciate their inherent complexity, capabilities and limitations. The assignment, on the other hand, requires you to develop a tailor-made explainability suite for a data set and predictive model of your choice, or alternatively analyse an explainability algorithm to identify its core algorithmic building blocks and explore how they affect the resulting explanation. (Note that there is a scope for a bespoke project if you have a well-defined idea in mind.)


(Reveal the topics covered in each theme by clicking the triangle button.)

Introduction to explainability
  • History of explainability
  • Types of explanations
  • Taxonomy and classification of explainability approaches
  • A human-centred perspective
  • Ante-hoc vs. post-hoc discussion
  • Multi-class explainability
  • Defining explainability
  • Evaluation of explainability techniques
A brief overview of data explainability
  • Data as an (implicit) model
  • Data summarisation and description
  • Dimensionality reduction
  • Exemplars, prototypes and criticisms
Transparent modelling
  • The ante-hoc vs. post-hoc distinction in view of information lineage (i.e., endogenous and exogenous sources of information that form the explanations)
  • Rule lists and sets
  • Linear models (and generalised additive models)
  • Decision trees
  • \(k\)-nearest neighbours and \(k\)-means
Feature importance
  • Permutation Importance
  • Feature Interaction
Feature influence
  • Individual Conditional Expectation
  • Partial Dependence
  • LIME
  • SHAP
  • Accumulated Local Effects
  • Exemplar explanations
  • Counterfactuals
  • Prototypes and criticisms
  • Scoped rules
  • RuleFit
  • Local, cohort and global surrogates


Two types of a (possibly group-based) assignment are envisaged. (However, if you have a well-defined project in mind, you may be allowed to pursue it – in this case talk to the course instructors.)

  1. Develop a bespoke explainability suite for a predictive model of your choice. If you are working on a machine learning project that could benefit from explainability, this project presents an opportunity to use the course as a platform to this end. Alternatively, you can explore explainability of a pre-existing model available to download or accessible through a web API.

  2. Choose an explainability method and identify its core algorithmic building blocks to explore how they affect the final explanations. You are free to explore explainability of inherently transparent models, develop model-specific approaches for an AI or ML technique that interests you, or pursue a model-agnostic technique.

For students who would like to learn more about explainable artificial intelligence and interpretable machine learning but cannot dedicate the time necessary to complete the assignment due to other commitments, there is a possibility of a lightweight project. In this case you can choose an explainability method and articulate its assumptions as well as any discrepancies from its (most popular) implementation – possibly based on some of the (interactive) course materials – as long as you present your findings at the end of the course.

The projects will be culminated in presentations and/or demos delivered in front of the entire cohort. The project delivery should focus on reproducibility of the results and quality of the investigation into explainability aspects of the chosen system, therefore the journey is more important than the outcome. Under this purview, all of the assumptions and choices – theoretical, algorithmic, implementation and otherwise – should be made explicit and justified. You are strongly encouraged to prepare and present your findings via one of the dashboarding or interactive reporting/presentation tools (see the list of options included below), however this aspect of the project is optional.

(See the description of each example project by clicking the triangle button.)

Identify the sources of explanation (dis)agreements for a given predictive modelling task

For a given data set – e.g., MNIST – one can train a collection of transparent and black-box models; for example, linear classifiers, decision trees, random forests, support vector machines (with different kernels), logistic regressions, perceptrons, neural networks. If the chosen data set lends itself to natural interpretability, i.e., instances (and their features) are understandable to humans, these models can be explained with an array of suitable techniques and their explanations compared and contrasted. Such experiments can help to better understand capabilities and limitations of individual explainability techniques, especially when their composition, configuration and parameterisation is considered. This can lead to practical guidelines on using these explainers and interpreting their results.

New composition of an existing explainability technique

When functionally independent building blocks of an explainability approach can be isolated, we can tweak or replace them to compose a more robust and accountable technique. Similarly, a well-known explainer can be expanded with a new explanatory artefact or modality, e.g., a counterfactual statement instead of feature importance/influence. Additionally, comparing the explanations output by the default and bespoke methods can help to uncover discrepancies that may be abused in order to generate misleading explanatory insights; for example, explainees can be deceived by presenting them with an explanation based on a specifically crafted sample of data (used with post-hoc methods).

New explainability technique from existing building blocks

Instead of improving a pre-existing explainability technique, algorithmic components from across the explainability spectrum can become an inspiration to build an entirely new explainer or explainability pipeline.

Explore the behaviour of a pre-existing model with explainability techniques

Given the success of deep learning in predictive modelling, opaque systems based on ML algorithms often end up in production. While it may be difficult to identify any of their undesirable properties from the outset, these are often discovered (and corrected) throughout the lifespan of such systems. In this space, explainability techniques may help to uncover these characteristic and pinpoint their sources, potentially leading to observations that reveal biases or aid in scientific discoveries. Either of these applications can have significant social impact and benefit, leading to these models being corrected or decommissioned. Sometimes, however, their idiosyncrasies can be observed, but their origin remains unaccounted for. For example, consider the case of machine learning models dealing with chest X-rays, which additionally can detect the race of the patients – something that doctors are incapable of discerning (see here and here for more details). While the reason for this behaviour remains a mystery, a thorough investigation of this, and similar, models with an array of well-understood (post-hoc) explainability techniques may be able to offer important clues.


The course will span two weeks, offering the following tuition each day (ten days total):

  • 1-hour lecture;
  • 1-hour supervised lab session (system design and coding); and
  • half-an-hour open office (general questions and project discussions).

The lectures will roughly follow the curriculum outlined above. The envisaged self-study time is around 20 hours, which largely involves completing a project of choice (possibly in small groups).

Learning Objectives


  • Understand the landscape of AI and ML explainability techniques.
  • Identify explainability needs of data-driven machine learning systems.
  • Recognise the capabilities and limitations of explainability approaches, both in general and in view of specific use cases.
  • Apply these skills to real-life AI and ML problems.
  • Communicate explainability findings through interactive reports and dashboards.

Specific to explainability approaches

  • Identify self-contained algorithmic components of explainers and understand their functions.
  • Connect these building blocks to the explainability requirements unique to the investigated predictive system.
  • Select appropriate algorithmic components and tune them to the problem at hand.
  • Evaluate these building blocks (in this specific context) independently and when joined together to form the final explainer.
  • Interpret the resulting explanations in view of the uncovered properties and limitations of the bespoke explainability algorithm.


  • Python programming.
  • Familiarity with basic mathematical concepts (relevant to machine learning).
  • Knowledge of machine learning techniques for tabular data.
  • Prior experience with machine learning approaches for images and text (e.g., deep learning) or other forms of data modelling (e.g., time series forecasting, reinforcement learning) if you decide to pursue a project in this direction.

Useful Resources


Kacper Sokol (; )

Kacper is a Research Fellow at the ARC Centre of Excellence for Automated Decision-Making and Society (ADM+S), affiliated with the School of Computing Technologies at RMIT University, Australia, and an Honorary Research Fellow at the Intelligent Systems Laboratory, University of Bristol, United Kingdom.

His main research focus is transparency – interpretability and explainability – of data-driven predictive systems based on artificial intelligence and machine learning algorithms. In particular, he has done work on enhancing transparency of predictive models with feasible and actionable counterfactual explanations and robust modular surrogate explainers. He has also introduced Explainability Fact Sheets – a comprehensive taxonomy of AI and ML explainers – and prototyped dialogue-driven interactive explainability systems.

Kacper is the designer and lead developer of FAT Forensics – an open source fairness, accountability and transparency Python toolkit. Additionally, he is the main author of a collection of online interactive training materials about machine learning explainability, created in collaboration with the Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.

Kacper holds a Master’s degree in Mathematics and Computer Science, and a doctorate in Computer Science from the University of Bristol, United Kingdom. Prior to joining ADM+S he has held numerous research posts at the University of Bristol, working with projects such as REFrAMe, SPHERE and TAILOR – European Union’s AI Research Excellence Centre. Additionally, he was a visiting researcher at the University of Tartu (Estonia); Simons Institute for the Theory of Computing, UC Berkeley (California, USA); and USI – Università della Svizzera italiana (Lugano, Switzerland). In his research, Kacper collaborated with numerous industry partners, such as THALES, and provided consulting services on explainable artificial intelligence and transparent machine learning.

Citing the Slides

If you happen to use these slides, please cite them as follows.

  author={Sokol, Kacper},
  title={{eXplainable} {Machine} {Learning} -- {USI} {Course}},


The creation of these educational materials was supported by the ARC Centre of Excellence for Automated Decision-Making and Society (project number CE200100005), and funded in part by the Australian Government through the Australian Research Council.