We’re at a pivotal moment in the path to mass adoption of artificial intelligence (AI). Google subsidiary DeepMind is leveraging AI to determine how to refer optometry patients. Haven Life is using AI to extend life insurance policies to people who wouldn’t traditionally be eligible, such as people with chronic illnesses and non-U.S. citizens. And Google self-driving car spinoff Waymo is tapping it to provide mobility to elderly and disabled people. But despite the good AI is clearly capable of doing, doubts abound over its safety, transparency, and bias.

IBM thinks part of the problem is a lack of standard practices.

There’s no consistent, agreed-upon way AI services should be “created, tested, trained, deployed, and evaluated,” Aleksandra Mojsilovic, head of AI foundations at IBM Research and codirector of the AI Science for Social Good program, today said in a blog post. Just as unclear is how those systems should operate, and how they should (or shouldn’t) be used.

To clear up the ambiguity surrounding AI, Mojsilovic and colleagues propose voluntary factsheets — formally called “Supplier’s Declaration of Conformity” (DoC) — that would be completed and published by companies who develop and provide AI, with the goal of “increas[ing] the transparency” of their services and “engender[ing] trust” in them.

Mojsilovic thinks that such factsheets could give a competitive advantage to companies in the marketplace, similar to how appliance companies get products Energy Star-rated for power efficiency.

“Like nutrition labels for foods or information sheets for appliances, factsheets for AI services would provide information about the product’s important characteristics,” Mojsilovic wrote. “The issue of trust in AI is top of mind for IBM and many other technology developers and providers. AI-powered systems hold enormous potential to transform the way we live and work but also exhibit some vulnerabilities, such as exposure to bias, lack of explainability, and susceptibility to adversarial attacks. These issues must be addressed in order for AI services to be trusted.”

Several core pillars form the basis for trust in AI systems, Mojsilovic explained: fairness, robustness, and explainability. Impartial AI systems can be credibly believed not to contain biased algorithms or datasets, or to contribute to the unfair treatment of certain groups. Robust AI systems are presumed safe from adversarial attacks and manipulation. And explainable AI systems aren’t a “black box” — their decisions are understandable by both researchers and developers.

“Just like a physical structure, trust can’t be built on one pillar alone. If an AI system is fair but can’t resist attack, it won’t be trusted. If it’s secure but we can’t understand its output, it won’t be trusted. To build AI systems that are truly trusted, we need to strengthen all the pillars together. Our comprehensive research and product strategy is designed to do just that, advancing on all fronts to lift the mantle of trust into place.”

The fourth pillar — lineage — concerns AI systems’ history. Documentation should shed light on algorithms’ “development, deployment, and maintenance” so that they can be audited throughout their lifecycle, Mojsilovic said.

That’s where the factsheets come in — they would answer questions ranging from system operation and training data to underlying algorithms, test setups and results, performance benchmarks, fairness and robustness checks, intended uses, maintenance, and retraining. More granular topics might include governance strategies used to track the AI service’s data workflow, the methodologies used in testing, and bias mitigations performed on the dataset.

For natural language processing algorithms specifically, the researchers propose “data statements” that would show how an algorithm might be generalized, how it might be deployed, and what biases it might contain.

Natural language processing systems aren’t as fraught with controversy as, say, facial recognition, but they’ve come under fire for their susceptibility to bias. A recent study commissioned by the Washington Post found that smart speakers made by Google and Amazon were 30 percent less likely to understand non-American accents than those of native-born users.

Mojsilovic and the team at IBM certainly have their work cut out for them. Well-publicized incidents like racially biased recidivism algorithms, highly inaccurate facial detection systems, and crash-prone autonomous cars haven’t done AI any favors. A survey by InsideSales.com in September found that 41.5 percent of respondents “couldn’t cite a single example of AI that they trust.”

IBM, Microsoft, Accenture, Facebook, and others are actively working on automated tools that detect and minimize bias, and companies like Speechmatics and Nuance have developed solutions specifically aimed at minimizing the so-called “accent gap” — the tendency of voice recognition models to skew toward speakers from certain regions. But in Mojsilovic’s view, documents detailing the ins and outs of systems would go a long way to restoring the public’s faith in AI.

“Fairness, safety, reliability, explainability, robustness, accountability — we all agree that they are critical. Yet, to achieve trust in AI, making progress on these issues will not be enough; it must be accompanied with the ability to measure and communicate the performance levels of a system on each of these dimensions,” she wrote. “Understanding and evaluating AI systems is an issue of utmost importance for the AI community, an issue we believe the industry, academia, and AI practitioners should be working on together.”