Why the AI we rely on can't get privacy right (yet)

While artificial intelligence (AI) powered technologies are now commonly appearing in many digital services we interact with on a daily basis, an often neglected truth is that few companies are actually building the underlying AI technology.

A good example of this is facial recognition technology, which is exceptionally complex to build and requires millions upon millions of facial images to train the machine learning models.

Consider all of the facial recognition based authentication and verification components of all the different services you use. Each service did not reinvent the wheel when making facial recognition available in their service; instead, they integrated with an AI technology provider. An obvious case of this is iOS services that have integrated FaceID, for example, to quickly log into your bank account. Less obvious cases are perhaps where you are asked to verify your identity by uploading images of your face and your identity document to a cloud service for verification, for example if you are looking to rent a car or open up a new online bank account.

We are also hearing more and more about governments using facial recognition in public forums to identify individuals in a crowd, but it is not as though each government is building their own facial recognition technology. They are purchasing the technology from an AI technology vendor.

Why is this significant? It surely makes sense for a company to rely on the expertise of an AI technology vendor rather than trying to build complicated AI models themselves, which will very likely not reach the necessary performance levels.

The significance is that, due to the fact that these AI services are built by one company and deployed by many others, the chain of responsibility to meet privacy requirements often collapses.

If a person has no direct relationship with the company that built the AI technology that is processing their personal data, then what hope does that person have to understand how their personal data is being used, how that data usage affects them, and how they can control that data usage?

What happens in practice is that the AI technology vendor seeks to tell their clients (e.g., the companies licensing the technology) how their technology works, and then they contractually require their clients to provide all required notices and to obtain all required consents from the people who are exposed to the AI technology.

Perhaps this model makes sense as it is a commonly established legal practice in the AI industry.

But how likely is it that the companies licensing the AI technology

Understand how the AI technology is provided, built, and performs?
Have managed to sufficiently explain the AI technology and how it uses personal data to their users?
Have built a means for their users to control how the AI vendor uses their personal data?

Take facial recognition technology as an example again. While most people have used or been exposed to facial recognition technology in one way or another, most people likely do not know whether an image of their face is being used to build that AI technology or how to find out the answer to that question -- to the extent it is even possible.

These problems caused by the complexity of the AI supply chain need to be fixed.

AI technology vendors must seek out innovative solutions to empower their clients who can then empower their users. This can include robust privacy notices, building in privacy reminders throughout their client integration documentation, and introducing technical methods for their clients to control data usage on an individual basis.

While those steps may empower a company to offer better notices and controls to its users, the AI technology vendor should also look for ways to interact with users directly. This means not only publishing a privacy policy explaining the AI technology but also, and more importantly, developing a means for a person to come to the AI technology vendor directly to learn about how their data is being used and how to control it.

Unfortunately, the white-labeling of these services presents a barrier to transparency. White labeling is the practice of making a technology appear as though it was built and is operated by the company making the service available. It's a practice commonly used to give consumers a more uniform and singular experience. But it causes significant problems when applied to AI technology.

Humans exposed to this technology have no chance of controlling their data and their privacy if there's no transparency regarding the AI supply chain. Both the technology vendors and the companies licensing that technology must make efforts to address this problem. This means working together to bring in transparency, and it means giving humans a clear way to control their data with each company. Only a concerted effort from all parties can bring about the paradigm shift we need to see in AI, where people control their digital world rather than the other way around.

[The opinions expressed in this article are the author's alone and do not necessarily reflect the views of any of the organizations he is associated with.]

Neal Cohen is Director of Privacy for Onfido, a machine learning powered remote biometric identity provider. He is also a technology and human rights fellow at Harvard’s Carr Center for Human Rights Policy and a non-residential fellow at Stanford’s Center for Internet and Society.

More