Google's MinDiff aims to mitigate unfair biases in classifiers

Google today released MinDiff, a new framework for mitigating (but not eliminating) unfair biases when training AI and machine learning models. The company says MinDiff is the culmination of years of work and has already been incorporated into various Google products, including models that moderate content quality.

The task of classification, which involves sorting data into labeled categories, is prone to biases against groups that are underrepresented in model training datasets. One of the most common metrics used to measure this bias is equality of opportunity, which seeks to minimize differences in false positive rates across different groups. But it's often difficult to achieve balance because of sparse data about demographics, the unintuitive nature of debiasing tools, and unacceptable accuracy tradeoffs.

MinDiff leverages in-process approaches in which a model's training objective is augmented with an objective focused on removing biases. This new objective is then optimized over a small sample of data with known demographic information. Given two slices of data, MinDiff works by penalizing the model for differences in the distributions of scores between the two sets such that as the model trains, it will try to minimize the penalty by bringing the distributions closer together.

To improve ease of use, researchers at Google switched from adversarial training to a regularization framework that penalizes statistical dependency between its predictions and demographic information for non-harmful examples. This encourages models to equalize error rates across all groups.

MinDiff minimizes the correlation between the predictions and the demographic group, which fine-tunes for the average and variance of predictions to be equal across groups even if the distributions differ afterward. It also considers the maximum mean discrepancy loss, which Google claims is better able to both remove biases and maintain model accuracy.

Google says MinDiff is the first in what will be a larger "model remediation library" of techniques suitable for different use cases. "Gaps in error rates of classifiers is an important set of unfair biases to address, but not the only one that arises in machine learning applications," Google senior software engineer Flavien Prost and staff research scientist Alex Beutel wrote in a blog post. "For machine learning researchers and practitioners, we hope this work can further advance research toward addressing even broader classes of unfair biases and the development of approaches that can be used in practical applications."

Google previously open-sourced ML-fairness-gym, a set of components for evaluating algorithmic fairness in simulated social environments. Other model debiasing and fairness tools in the company's suite include the What-If Tool, a bias-detecting feature of the TensorBoard web dashboard for its TensorFlow machine learning framework, and an accountability framework intended to add a layer of quality assurance for businesses deploying AI models.

More