Google researchers release audit framework to close AI accountability gap

Researchers associated with Google and the Partnership on AI have created a framework to help companies and their engineering teams audit AI systems before deploying them. The framework, intended to add a layer of quality assurance to businesses launching AI, translates into practice values often espoused in AI ethics principles and tackles an accountability gap authors say exists in AI today.

The work, titled "Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing" is one of a handful of outstanding AI ethics research papers accepted for publication as part of the Fairness, Accountability, and Transparency (FAT) conference, which takes place this week in Barcelona, Spain.

"The proposed auditing framework is intended to contribute to closing the development and deployment accountability gap of large-scale artificial intelligence systems by embedding a robust process to ensure audit integrity," the paper reads. "At a minimum, the internal audit process should enable critical reflections on the potential impact of a system, serving as internal education and training on ethical awareness in addition to leaving what we refer to as a 'transparency trail' of documentation at each step of the development cycle."

The framework is also intended to identify risks and reduce them to the lowest degree possible, as well as to map out how things that can be done differently in the future or how to respond to a failure after launch. The method is also intended to go beyond the risk-based assessments some companies perform today that ask "What if?" but often fail to incorporate social or ethical challenges.

Named Scoping, Mapping, Artifact Collection, Testing, and Reflection (SMACTR), the framework aims to encourage companies to perform audits before an AI model is deployed for customer use. In the case of companies like Google, a model can impact the lives of billions of users.

SMACTR audits produce a number of documents, including checklists that go beyond yes or no answers; design history files to document design inputs and outputs, and model cards to make sure AI is deployed for its intended purpose; and failure modes and effects analysis (FMEA) to incorporate known issues and experiences from engineers and product designers.

Each letter in SMACTR is meant to act as a stage in the audit process:

The Scoping stage is when the risk assessment process and auditors produce assessments of social impact and an ethical review of system use cases.
The Mapping stage is for creating a map of internal stakeholders and identifying key collaborators for the execution of the audit.
The Artifact Collection stage is for creation of an audit checklist as well as datasheets or models cards that document how a model was built, assumptions made during development, and its intended use.
The Testing stage assesses performance using methods like adversarial training and creates an ethical risk analysis chart that identifies the likelihood and severity of a failure or level of risk.
The Reflection stage is for the auditing and engineering teams to evaluate internal design recommendations or create a mitigation plan.

The algorithm audit framework borrows from a number of other fields where safety is critical to protect human life, such as aerospace and health care, which now carry out audits as part of the design process.

Industries moved to make audits a standard way to either respond to a series of failures and scandals, or raise standards to meet government regulations. The framework adopts tools found in other industries, like FMEAs, as well as lessons such as "complex systems tend to drift toward unsafe conditions unless constant vigilance is maintained." However, it also acknowledges that AI can encounter unique problems.

Nine researchers collaborated on the SMACTR framework, including Google employees Andrew Smart, Margaret Mitchell, and Timnit Gebru, as well as former Partnership on AI fellow and current AI Now Institute fellow Deborah Raji.

Mitchell and Gebru collaborated on model cards, an approach Google Cloud now uses for some of its AI models. Gebru also worked on datasheets for datasets, and collaborated separately with Algorithmic Justice League's Joy Buolamwini on audits of major facial recognition software services sold by companies like Microsoft and Amazon that found poor performance for people with dark skin and particularly women of color.

The SMACTR paper also attempts to acknowledge shortcomings like the fact that portions of the audit rely on or are vulnerable to human judgment. It urges auditors to be mindful of their own biases and their company's viewpoints to avoid making the auditing process simply an act of reputation management. The authors argue such an approach may be increasingly valuable as AI models grow in size and distribution across multiple devices.

"AI has the potential to benefit the whole of society," the paper reads. "[H]owever there is currently an inequitable risk distribution such that those who already face patterns of structural vulnerability or bias disproportionately bear the costs and harms of many of these systems. Fairness, justice and ethics require that those bearing these risks are given due attention and that organizations that build and deploy artificial intelligence systems internalize and proactively address these social risks as well, being seriously held to account for system compliance to declared ethical principles."

More