How to mitigate bias in AI

As the common proverb goes, to err is human. One day, machines may offer workforce solutions that are free from human decision-making mistakes; however, those machines learn through algorithms and systems built by programmers, developers, product managers, and software teams with inherent biases (like all other humans). In other words, to err is also machine.

Artificial intelligence has the potential to improve our lives in countless ways. However, since algorithms often are created by a few people and distributed to many, it’s incumbent upon the creators to build them in a way that benefits populations and communities equitably. This is much easier said than done -- no programmer can be expected to hold the full knowledge and awareness necessary to build a bias-free AI model, and further, the data gathered can be biased as a result of the way they are collected and the cultural assumptions behind those empirical methods. Fortunately, when building continuously learning AI systems of the future, there are ways to reduce that bias within models and systems. The first step is about recognition.

It’s important to recognize that bias exists in the real world, in all industries and among all humans. The question to ask is not how to make bias go away but how to detect and mitigate such bias. Understanding this helps teams take accountability to ensure that models, systems, and data are incorporating inputs from a diverse set of stakeholders and samples.

With countless ways for bias to seep into algorithms and their applications, the decisions that impact models should not be made in isolation. Purposefully cultivating a workgroup of individuals from diversified backgrounds and ideologies can help inform decisions and designs that foster optimal and equitable outcomes.

Recently, the University of Cambridge conducted an evaluation of over 400 models attempting to detect COVID-19 faster via chest X-rays. The analysis found many algorithms had both severe shortcomings and a high risk of bias. In one instance, a model trained on X-ray images of adult chests was tested on a data set of X-rays from pediatric patients with pneumonia. Although adults experience COVID-19 at a higher rate than children, the model positively identified cases disproportionally. It’s likely because the model weighted rib sizes in its analysis, when in fact, the most important diagnostic approach is to examine the diseased area of the lung and rule out other issues like a collapsed lung.

One of the bigger problems in model development is that the datasets rarely are made available due to the sensitive nature of the data, so it’s often hard to determine how a model is making a decision. This illustrates the importance of transparency and explainability in both how a model is created and its intended use. Having key stakeholders (i.g., clinicians, actuaries, data engineers, data scientists, care managers, ethicists, and advocates) developing a model in a single data view can remove several human biases that have persisted due to the siloed nature of healthcare.

It’s also worth noting that diversity extends much further than the people creating algorithms. Fair algorithms test for bias in the underlying data in their models. In the case of the COVID-19 X-ray models, this was the Achilles’ heel. The data sampled and collected to build models can underrepresent certain groups whose outcomes we want to predict. Efforts must be made to build more complete samples with contributions from underrepresented groups to better represent populations.

Without developing more robust data sets and processes around how data is recorded and ingested, algorithms may amplify psychological or statistical bias from how the data was collected. This will negatively impact each step of the model-building process, such as the training, evaluation, and generalization phases. However, by including more people from different walks of life, the AI models built will have a broader understanding of the world, which will go a long way toward reducing the inherent biases of a single individual or homogeneous group.

It may surprise some engineers and data scientists, but lines of code can create unfairness in many ways. For example, Twitter automatically crops uploaded images to improve user experience, but its engineers received feedback that the platform was incorrectly missing or misidentifying certain faces. After multiple attempts to improve the algorithm, the team ultimately realized that image trimming was a decision best made by people. Choosing the “argmax” (largest predicted probability) for finally outputting predictions amplifies disparate impact. An enormous number of test data sets, as well as scenario-based testing, are needed to neutralize these concerns.

There will always be gaps in AI models, yet it’s important to maintain accountability for them and correct them. And fortunately, when teams detect potential biases with a base model that is built and performs sufficiently, existing methods can be used to de-bias the data. Ideally, models shouldn’t run without having a proper continuous feedback loop where predicted outputs are reused to train new versions. When working with diverse teams, data, and algorithms, building feedback-aware AI can reduce the innate gaps where bias can sneak in, yet without the diversity of inputs, AI models will just re-learn from its bias.

If individuals and teams are cognizant of the existence of bias, then they have the necessary tools at the data, algorithm, and human levels to build a more responsible AI. The best solution is to be aware that these biases exist and maintain safety nets to address them for each project and model deployment. What tools or approaches do you use to create algorithm fairness in your industry? And most importantly, how do you define the purpose behind each model?

Akshay Sharma is executive vice president of artificial intelligence at digital health company Sharecare.

More