IBM releases AI model toolkit to help developers measure uncertainty

At its Digital Developer Conference today, IBM open-sourced Uncertainty Quantification 360 (UQ360), a new toolkit focused on enabling AI to understand and communicate its uncertainty. Following in the footsteps of IBM's AI Fairness 360 and AI Explainability 360, the goal of UQ360 is to foster community practices across researchers, data scientists, developers, and others that might lead to better understanding and communication around the limitations of AI.

It's commonly understood that deep learning models are overconfident -- even when they make mistakes. Epistemic uncertainty describes what a model doesn't know because the training data wasn't appropriate. On the other hand, aleatoric uncertainty is the uncertainty arising from the natural randomness of observations. Given enough training samples, epistemic uncertainty will decrease, but aleatoric uncertainty can't be reduced even when more data is provided.

UQ360 offers a set of algorithms and a taxonomy to quantify uncertainty, as well as capabilities to measure and improve uncertainty quantification (UQ). For every UQ algorithm provided in the UQ360 Python package, a user can make a choice of an appropriate style of communication by following IBM's guidance on communicating UQ estimates, from descriptions to visualizations. UQ360 also includes an interactive experience that provides an introduction to producing UQ and ways to use UQ in a house price prediction application. Moreover, UQ360 includes a number of in-depth tutorials to demonstrate how to use UQ across the AI lifecycle.

The importance of uncertainty

Uncertainty is a major barrier standing in the way of self-supervised learning's success, Facebook chief AI scientist Yann LeCun said at the International Conference on Learning Representation (ICLR) last year. Distributions are tables of values that link every possible value of a variable to the probability the value could occur. They represent uncertainty perfectly well where the variables are discrete, which is why architectures like Google's BERT are so successful. But researchers haven't yet discovered a way to usefully represent distributions where the variables are continuous -- i.e., where they can be obtained only by measuring.

As IBM research staff members Prasanna Sattigeri and Q. Vera Liao note in a blog post, the choice of UQ method depends on a number of factors, including the underlying model, the type of machine learning task, characteristics of the data, and the user's goal. Sometimes a chosen UQ method might not produce high-quality uncertainty estimates and could mislead users, so it's crucial for developers to evaluate the quality of UQ and improve the quantification quality if necessary before deploying an AI system.

In a recent study conducted by Himabindu Lakkaraju, an assistant professor at Harvard University, showing uncertainty metrics to both people with a background in machine learning and non-experts had an equalizing effect on their resilience to AI predictions. While fostering trust in AI may never be as simple as providing metrics, awareness of the pitfalls could go some way toward protecting people from machine learning's limitations.

"Common explainability techniques shed light on how AI works, but UQ exposes limits and potential failure points," Sattigeri and Liao wrote. "Users of a house price prediction model would like to know the margin of error of the model predictions to estimate their gains or losses. Similarly, a product manager may notice that an AI model predicts a new feature A will perform better than a new feature B on average, but to see its worst-case effects on KPIs, the manager would also need to know the margin of error in the predictions."