We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
AI systems’ failures have become a recurring theme in technology news. Credit scoring algorithms that discriminate against women. Computer vision systems that misclassify people with darker skin. Recommendation systems that promote violent content. Trending algorithms that amplify fake news.
Most complex software systems fail at some point, and they need to be updated regularly. We have procedures and tools that help us find and fix these errors. But current AI systems, mostly dominated by machine learning algorithms, are different from traditional software. We are still exploring the implications of applying them to different applications, and protecting them against failure requires new ideas and approaches.
This is the idea behind the AI Incident Database(AIID), a repository of documented failures of AI systems in the real world. The database aims to make it easier to see past failures and avoid repeating them.
The AIID is sponsored by the Partnership on AI (PAI), an organization that seeks to develop best practices for AI, improve public understanding of the technology, and reduce potential harms. PAI was founded in 2016 by AI researchers at Apple, Amazon, Google, Facebook, IBM, and Microsoft, but it has since expanded to include more than 50 member organizations, many of which are nonprofit.
Experience documenting failures
In 2018, members of PAI were discussing research on an “AI failure taxonomy,” or a way to classify AI failures in a consistent way. But there was no collection of AI failures that could be used to develop the taxonomy. This led to the idea of developing the AI Incident Database.
“I knew about aviation incident and accident databases and committed to building AI’s version of the aviation database during a Partnership on AI meeting,” Sean McGregor, lead technical consultant for the IBM Watson AI XPRIZE, said in written comments to TechTalks. Since then, McGregor has been overseeing the AIID effort and helping develop the database.
The structure and format of AIID was partly inspired by incident databases in the aviation and computer security industries. The commercial air travel industry has managed to increase flight safety by systematically analyzing and archiving past accidents and incidents within a shared database. Likewise, a shared database of AI incidents can help disseminate knowledge and improve the safety of AI systems deployed in the real world.
Meanwhile, the Common Vulnerabilities and Exposures (CVE) maintained by MITRE Corp is a good example of a database covering software failures across various industries. It has helped shape the vision for AIID as a system that documents failures from AI applications in different fields.
“The goal of the AIID is to prevent intelligent systems from causing harm, or at least reduce their likelihood and severity,” McGregor says.
McGregor points out that the behavior of traditional software is usually well understood, but modern machine learning systems cannot be completely described or exhaustively tested. Machine learning derives its behavior from its training data, so its behavior has the capacity to change in unintended ways as the underlying data changes over time.
“These factors, combined with deep learning systems’ capability to enter into the unstructured world we inhabit, means malfunctions are more likely, more complicated, and more dangerous,” McGregor says.
Today, we have deep learning systems that can recognize objects and people in images, process audio data, and extract information from millions of text documents in ways that were impossible with traditional, rules-based software, which expects data to be neatly structured in a tabular format. This shift has made it possible to apply AI to the physical world, through applications for self-driving cars, security cameras, hospitals, and voice-enabled assistants. But all of these new areas create new vectors for failure.
Documenting AI incidents
Since its founding, AIID has gathered information about more than 1,000 AI incidents from the media and publicly available sources. Fairness issues are the most common AI incidents submitted to AIID, particularly in cases where governments are using intelligent systems, like facial recognition programs. “We are also increasingly seeing incidents involving robotics,” McGregor says.
Hundreds of other incidents are in the process of being reviewed and added to the AI Incident Database, according to McGregor. “Unfortunately, I don’t believe we will have a shortage of new incidents,” he says.
Visitors can query the database for incidents based on the source, author, submitter, incident ID, or keywords. For instance, searching for “translation” shows there are 42 reports of AI incidents involving machine translation. You can then filter the research down based on other criteria.
Putting the AI Incident Database to use
A consolidated database of incidents involving AI systems can serve various purposes in the research, development, and deployment of AI systems.
For instance, if a product manager is evaluating the addition of an AI-powered recommendation system to an application, they can check 13 reports and 10 incidents in which such systems have caused harm to people. This will help the product manager set the right requirements for the feature their team is developing.
Other executives can use the AI Incident Database to make better decisions. For example, risk officers can query the database for damages employing machine translation systems might cause and develop the right risk mitigation measures.
Engineers can use the database to find out harms their AI systems could cause when deployed in the real world. And researchers can cite it for papers on the fairness and safety of AI systems.
Finally, the growing database of incidents can serve to caution companies implementing AI algorithms in their applications. “Technology companies are famous for their penchant to move quickly without evaluating all potential bad outcomes. When bad outcomes are enumerated and shared, it becomes impossible to proceed in ignorance of harms,” McGregor says.
The AI Incident Database is built on a flexible architecture that will allow the development of various applications for querying the database and obtaining other insights, such as key terminology and contributors. In a paper that will be presented at the Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21), McGregor has discussed the full details of the architecture. AIID is also an open source project on GitHub, where the community can help improve and expand its capabilities.
With a solid database in place, McGregor is now working with Partnership on AI to develop a flexible taxonomy for AI incident classification. In the future, the AIID team hopes to expand the system to automate the monitoring of AI incidents.
“The AI community has begun sharing incident records with each other to motivate changes to their products, control procedures, and research programs,” McGregor says. “The site was publicly released in November, so we are just starting to realize the benefits of the system.”
Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.
This story originally appeared on Bdtechtalks.com. Copyright 2021
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.