AI safety tools can help mitigate bias in algorithms

As AI proliferates, researchers are beginning to call for technologies that might foster trust in AI-powered systems. According to a survey conducted by KPMG, across five countries -- the U.S., the U.K., Germany, Canada, and Australia -- over a third of the general public says that they're unwilling to place trust in AI systems in general. And in a report published by Pega, only 25% of consumers said they'd trust a decision made by an AI system regarding a qualification for a bank loan, for example.

The concern has yielded a breed of software that attempts to impose constraints on AI systems charged with risky decision-making. Some focus on reinforcement learning, or AI that's progressively spurred toward goals via rewards, which forms the foundation of self-driving cars and drug discovery systems. Others focus more broadly on fairness, which can be an elusive quality in AI systems -- mostly owing to biases in algorithms and datasets.

Among others, OpenAI and Alphabet-owned DeepMind have released environments to train "safe" AI systems for a different types of applications. More make their way into open source on a regular cadence, ensuring that the study of constrained or safe AI has legs -- and a lasting impact.

Safety tools

Safety tools for AI training are designed to prevent systems from engaging in dangerous behaviors that might lead to errors. They typically make use of techniques like constrained reinforcement learning, which implements "cost functions" that the AI must learn to constrain over time. Constrained systems figure out tradeoffs that achieve certain defined outcomes. For example, a "constrained" driverless car might learn to avoid collisions rather than allow itself to have collisions as long as it completes trips.

Safety tools also encourage AI to explore a range of states through different hypothetical behaviors. For example, they might use a generative system to predict behaviors informed by data like random trajectories or safe expert demonstrations. A human supervisor can label the behaviors with rewards, so that the AI interactively learns the safest behaviors to maximize its total reward.

Beyond reinforcement learning, safety tools encompass frameworks for mitigating biases while training AI models. For example, Google offers MinDiff, which aims to inject fairness into classification, or the process of sorting data into categories. Classification is prone to biases against groups underrepresented in model training datasets, and it can be difficult to achieve balance because of sparse demographics data and potential accuracy tradeoffs.

Google has also open-sourced ML-fairness-gym, a set of components for evaluating algorithmic fairness in simulated social environments. Other model debiasing and fairness tools in the company's suite include the What-If Tool, a bias-detecting feature of the TensorBoard web dashboard for its TensorFlow machine learning framework; and SMACTR (Scoping, Mapping, Artifact Collection, Testing, and Reflection), an accountability framework intended to add a layer of quality assurance for businesses deploying AI models.

Not to be outdone, Microsoft provides Fairlearn, which addresses two kinds of harms: allocation harms and quality-of-service harms. Allocation harms occur when AI systems extend or withhold opportunities, resources, or information -- for example, in hiring, school admissions, and lending. Quality-of-service harms refer to whether a system works as well for one person as it does for another, even if no opportunities, resources, or information are extended or withheld.

According to Microsoft, professional services firm Ernst & Young used Fairlearn to evaluate the fairness of model outputs with respect to sex. The toolkit revealed a 15.3% difference between positive loan decisions for males versus females, and Ernst & Young’s modeling team then developed and trained multiple remediated models and visualized the common trade-off between fairness and model accuracy.

LinkedIn not long ago released the LinkedIn Fairness Toolkit (LiFT), a software library aimed at enabling the measurement of fairness in AI and machine learning workflows. The company says LiFT can be deployed during training and scoring to measure biases in training data sets, and to evaluate notions of fairness for models while detecting differences in their performance across subgroups.

To date, LinkedIn says it has applied LiFT internally to measure the fairness metrics of training data sets for models prior to their training. In the future, the company plans to increase the number of pipelines where it’s measuring and mitigating bias on an ongoing basis through deeper integration of LiFT.

Rounding out the list of high-profile safety tools is IBM's AI Fairness 360 toolkit, which contains a library of algorithms, code, and tutorials that demonstrate ways to implement bias detection in models. The toolkit recommends adjustments -- such as algorithmic tweaks or counterbalancing data -- that might lessen their impact, explaining which factors influenced a given machine learning model’s decision as well as its overall accuracy, performance, fairness, and lineage.

A more recent addition to the scene is a dataset and tool for detecting demographic bias in voice and speech recognition apps. The Artie Bias Corpus (ABC), which consists of audio files along with their transcriptions, aims to diagnose and mitigate the impact of factors like age, gender, and accent in voice recognition systems. AI startup Pymetrics' Audit AI, which was also recently launched, is designed to determine whether a specific statistic or trait fed into an algorithm is being favored or disadvantaged at a statistically significant, systematic rate that leads to adverse impact on people underrepresented in a dataset.

Steps in the right direction

Not all safety tools are created equal. Some aren't being maintained or lack documentation, and there's a limit to the degree that they can remediate potential harm. Still, adopting these tools in the enterprise can instill a sense of trust among both external and internal stakeholders.

A study by Capgemini found that customers and employees will reward organizations that practice ethical AI with greater loyalty, more business, and even a willingness to advocate for them -- and in turn, punish those that don't. The study suggests that there's both reputational risk and a direct impact on the bottom line for companies that don't approach the issue thoughtfully.

Moreover, an overwhelming majority of Americans -- 82% -- believe that AI should be carefully managed, comparable to survey results from European Union respondents, according to a 2019 report from the Center for the Governance of AI. This suggests a clear mandate for businesses to exercise the responsible and fair deployment of AI, using whatever tools are necessary to achieve this objective.