What algorithm auditing startups need to succeed

To provide clarity and avert potential harms, algorithms that impact human lives would ideally be reviewed by an independent body before they're deployed, just as environmental impact reports must be approved before a construction project can begin. While no such legal requirement for AI exists in the U.S., a number of startups have been created to fill an algorithm auditing and risk assessment void.

A third party that is trusted by the public and potential clientele could increase trust in AI systems overall. As AI startups in aviation and autonomous driving have argued, regulation could enable innovation and help businesses, governments, and individuals safely adopt AI.

In recent years, we have seen proposals for numerous laws that support algorithm audits by an external company, and last year dozens of influential members of the AI community from academia, industry, and civil society recommended external algorithm audits as one way to put AI principles into action.

Like consulting firms that help businesses scale AI deployments, offer data monitoring services, and sort unstructured data, algorithm auditing startups fill a niche in the growing AI industry. But recent events surrounding HireVue seem to illustrate how these companies differ from other AI startups.

HireVue is currently used by more than 700 companies, including Delta, Hilton, and Unilever, for prebuilt and custom assessment of job applicants based on a resume, video interview, or their performance when playing psychometric games.

Two weeks ago, HireVue announced that it would no longer use facial analysis to determine whether a person is fit for a job. You may ask yourself: How could recognizing characteristics in a person's face have ever been considered a scientifically verifiable way to conclude that they're qualified for a job? Well, HireVue never really proved out those results, but the claim raised a lot of questions.

A HireVue executive said in 2019 that 10% to 30% of competency scores could be tied to facial analysis. But reporting at that time called the company's claim "profoundly disturbing." Before the Utah-based company decided to ditch facial analysis, ethics leader Suresh Venkatasubramanian resigned from a HireVue advisory board. And the Electronic Privacy Information Center filed a complaint with the Federal Trade Commission (FTC) alleging HireVue engaged in unfair and deceptive trade practices in violation of the FTC Act. The complaint specifically cites studies that have found facial recognition systems may identify emotion differently based on a person's race. The complaint also pointed to a documented history of facial recognition systems misidentifying women with dark skin, people who do not conform to a binary gender identity, and Asian Americans.

Facial analysis may not identify individuals -- like facial recognition technology would -- but as Partnership on AI put it, facial analysis can classify characteristics with "more complex cultural, social, and political implications," like age, race, or gender.

Despite these concerns, in a press release announcing the results of their audit, HireVue states: "The audit concluded that '[HireVue] assessments work as advertised with regard to fairness and bias issues.'" The audit was carried out by O'Neil Risk Consulting and Algorithmic Auditing (ORCAA), which was created by data scientist Cathy O'Neil. O'Neil is also author of the book Weapons of Math Destruction, which takes a critical look at algorithms' impact on society.

The audit report contains no analysis of AI system training data or code, but rather conversations about the kinds of harm HireVue's AI could cause in conducting prebuilt assessments of early career job applicants across eight measurements of competency.

The ORCAA audit posed questions to teams within the company and external stakeholders, including people asked to take a test using HireVue software and businesses that pay for the company's services.

After you sign a legal agreement, you can read the eight-page audit document for yourself. It states that by the time ORCAA conducted the audit, HireVue had already decided to begin phasing out facial analysis.

The audit also conveys a concern among stakeholders that visual analysis makes people generally uncomfortable. And a stakeholder interview participant voiced concern that HireVue facial analysis may work differently for people wearing head or face coverings and disproportionately flag their application for human review. Last fall, VentureBeat reported that people with dark skin taking the state bar exam with remote proctoring software expressed similar concerns.

Brookings Institution fellow Alex Engler's work focuses on issues of AI governance. In an op-ed at Fast Company this week, Engler wrote that he believes HireVue mischaracterized the audit results to engage in a form of ethics washing and described the company as more interested in "favorable press than legitimate introspection." He also characterized algorithm auditing startups as a "burgeoning but troubled industry" and called for governmental oversight or regulation to keep audits honest.

HireVue CEO Kevin Parker told VentureBeat the company began to phase out facial analysis use about a year ago. He said HireVue arrived at that decision following negative news coverage and an internal assessment that concluded "the benefit of including it wasn't enough to justify the concern it was causing."

Parker disputes Engler's assertion that HireVue mischaracterized audit results and said he's proud of the outcome. But one thing Engler, HireVue, and ORCAA agree on is the need for industrywide changes.

"Having a standard that says 'Here's what we mean when we say algorithmic audit' and what it covers and what it says intent is would be very helpful, and we're eager to participate in that and see those standards come out. Whether it's regulatory or industry, I think it's all going to be helpful," Parker said.

So what kind of government regulation, industry standards, or internal business policy is needed for algorithm auditing startups to succeed? And how can they maintain independence and avoid becoming co-opted like some AI ethics research and diversity in tech initiatives have in recent years?

To find out, VentureBeat spoke with representatives from bnh.ai, Parity, and ORCAA, startups offering algorithm audits to business and government clients.

Require businesses to carry out algorithm audits

One solution endorsed by people working at each of the three companies was to enact regulation requiring algorithm audits, particularly for algorithms informing decisions that significantly impact people's lives.

"I think the final answer is federal regulation, and we've seen this in the banking industry," bnh.ai chief scientist and George Washington University visiting professor Patrick Hall said. The Federal Reserve's SR-11 guidance on model risk management currently mandates audits of statistical and machine learning models, which Hall sees as a step in the right direction. The National Institute for Standards and Technology (NIST) tests facial recognition systems trained by private companies, but that is a voluntary process.

ORCAA chief strategist Jacob Appel said an algorithm audit is currently defined as whatever a selected algorithm auditor is offering. He suggests companies be required to disclose algorithm audit reports the same way publicly traded businesses are obligated to share financial statements. For businesses to undertake a rigorous audit when there is no legal obligation for them to do so is commendable, but Appel said this voluntary practice reflects a lack of oversight in the current regulatory environment.

"If there are complaints or criticisms about how HireVue's audit results were released, I think it's helpful to see connection with the lack of legal standards and regulatory requirements as contributing to those outcomes," he said. "These early examples may help highlight or underline the need for an environment where there are legal and regulatory requirements that give some more momentum to the auditors."

There are growing signs that external algorithm audits may become a standard. Lawmakers in some parts of the United States have proposed legislation that would effectively create markets for algorithm auditing startups. In New York City, lawmakers have proposed mandating an annual test for hiring software that uses AI. Last fall, California voters rejected Prop 25, which would have required counties to replace cash bail systems with an algorithmic assessment. The related Senate Bill 36 requires external review of pretrial risk assessment algorithms by an independent third party. In 2019, federal lawmakers introduced the Algorithmic Accountability Act to require companies to survey and fix algorithms that result in discriminatory or unfair treatment.

However, any regulatory requirement will have to consider how to measure fairness and the influence of AI provided by a third party since few AI systems are built entirely in-house.

Rumman Chowdhury is CEO of Parity, a company she created a few months ago after leaving her position as a global lead for responsible AI at Accenture. She believes such regulation should take into consideration the fact that use cases can range greatly from industry to industry. She also believes legislation should address intellectual property claims from AI startups that do not want to share training data or code, a concern such startups often raise in legal proceedings.

"I think the challenge here is balancing transparency with the very real and tangible need for companies to protect their IP and what they're building," she said. "It's unfair to say companies should have to share all their data and their models because they do have IP that they're building, and you could be auditing a startup."

Maintain independence and grow public trust

To avoid co-opting the algorithm auditing startup space, Chowdhury said it will be essential to establish common professional standards through groups like the IEEE or government regulation. Any enforcement or standards could also include a government mandate that auditors receive some form of training or certification, she said.

Appel suggested that another way to enhance public trustworthiness and broaden the community of stakeholders impacted by technology is to mandate a public comment period for algorithms. Such periods are commonly invoked ahead of law or policy proposals or civic efforts like proposed building projects.

Other governments have begun implementing measures to increase public trust in algorithms. The cities of Amsterdam and Helsinki created algorithm registries in late 2020 to give local residents the name of the person and city department in charge of deploying a particular algorithm and provide feedback.

Define audits and algorithms

A language model with billions of parameters is different from a simpler algorithmic decision-making system made with no qualitative model. Definitions of algorithms may be necessary to help define what an audit should contain, as well as helping companies understand what an audit should accomplish.

"I do think regulation and standards do need to be quite clear on what is expected of an audit, what it should accomplish so that companies can say ‘This is what an audit cannot do and this is what it can do.' It helps to manage expectations I think," Chowdhury said.

A culture change for humans working with machines

Last month, a cadre of AI researchers called for a culture change in computer vision and NLP communities. A paper they published considers the implications of a culture shift for data scientists within companies. The researchers' suggestions include improvements in data documentation practices and audit trails through documentation, procedures, and processes.

Chowdhury also suggested people in the AI industry seek to learn from structural problems other industries have already faced.

Examples of this include the recently launched AI Incidents database, which borrows an approach used in aviation and computer security. Created by the Partnership on AI, the database is a collaborative effort to document instances in which AI systems fail. Others have suggested that the AI industry incentivize finding bias in networks the way the security industry does with bug bounties.

"I think it's really interesting to look at things like bug bounties and incident reporting databases because it enables companies to be very public about the flaws in their systems in a way where we're all working on fixing them instead of pointing fingers at them because it has been wrong," she said. "I think the way to make that successful is an audit that can't happen after the fact -- it would have to happen before something is released."

Don't consider an audit a cure-all

As ORCAA's audit of a HireVue use case shows, an audit's disclosure can be limited and does not necessarily ensure AI systems are free from bias.

Chowdhury said a disconnect she commonly encounters with clients is an expectation that an audit will only consider code or data analysis. She said audits can also focus on specific use cases, like collecting input from marginalized communities, risk management, or critical examination of company culture.

"I do think there is an idealistic idea of what an audit is going to accomplish. An audit's just a report. It's not going to fix everything, and it's not going to even identify all the problems," she said.

Bnh.ai managing director Andrew Burt said clients tend to view audits as a panacea rather than part of a continuing process to monitor how algorithms perform in practice.

"One-time audits are helpful but only to a point, due to the way that AI is implemented in practice. The underlying data changes, the models themselves can change, and the same models are frequently used for secondary purposes, all of which require periodic review," Burt said.

Consider risk beyond what's legal

Audits to ensure compliance with government regulation may not be sufficient to catch potentially costly risks. An audit might keep a company out of court, but that's not always the same thing as keeping up with evolving ethical standards or managing the risk unethical or irresponsible actions pose to a company's bottom line.

"I think there should be some aspect of algorithmic audit that is not just about compliance, and it's about ethical and responsible use, which by the way is an aspect of risk management, like reputational risk is a consideration. You can absolutely do something legal that everyone thinks is terrible," Chowdhury said. "There's an aspect of algorithmic audit that should include what is the impact on society as it relates to the reputational impact on your company, and that has nothing to do with the law actually. It's actually what else above and beyond the law?"

Final thoughts

In today's environment for algorithm auditing startups, Chowdhury said she worries companies savvy enough to understand the policy implications of inaction may attempt to co-opt the auditing process and steal the narrative. She's also concerned that startups pressured to grow revenue may cosign less than robust audits.

"As much as I would love to believe everyone is a good actor, everyone is not a good actor, and there's certainly grift to be done by essentially offering ethics washing to companies under the guise of algorithmic auditing," she said. "Because it's a bit of a Wild West territory when it comes to what it means to do an audit, it's anyone's game. And unfortunately, when it's anyone's game and the other actor is not incentivized to perform to the highest standard, we're going to go down to the lowest denominator is my fear."

Top Biden administration officials from the FTC, Department of Justice, and White House Office of Science and Technology have all signaled plans to increase regulation of AI, and a Democratic Congress could tackle a range of tech policy issues. Internal audit frameworks and risk assessments are also options. The OECD and Data & Society are currently developing risk assessment classification tools businesses can use to identify whether an algorithm should be considered high or low risk.

But algorithm auditing startups are different from other AI startups in that they need to seek approval from an independent arbiter and to some degree the general public. To ensure their success, people behind algorithm auditing startups, like those I spoke with, increasingly suggest stronger industrywide regulation and standards.