How 'adversarial' attacks reveal machine learning's weakness

The use of computer vision technologies to boost machine learning continues to accelerate, driven by optimism that classifying huge volumes of images will unleash all sorts of new applications and forms of autonomy.

But there's a darker side to this transformation: These learning systems remain remarkably easy to fool using so-called "adversarial attacks." Even worse is that leading researchers acknowledge they don't really have a solution for stopping mischief makers from wreaking havoc on these systems.

"Can we defend against these attacks?" said Nicolas Papernot, a research scientist at Google Brain, the company's deep learning artificial intelligence research team. "Unfortunately, the answer is no."

Papernot, who is also an assistant professor at the University of Toronto, was speaking recently in Paris at the annual France is AI conference hosted by France Digitale. He was followed later in the morning by Jamal Atif, a professor at the Université Paris-Dauphine, who also addressed the growing threat of adversarial attacks to disrupt machine learning.

At its most basic, an adversarial attack refers to the notion of introducing some kind of element into a machine learning model designed specifically to incorrectly identify something.

During Papernot's presentation, he cited this example from a recent research paper:

On the left, the machine learning model sees the picture of the panda and correctly identifies it with a moderately high degree of confidence. In the middle, someone has overlaid this pixelated image that is not necessarily visible to the human eye into the panda image. The result is that the computer now is almost certain that it is a gibbon.

The simplicity of this deception highlights a couple of weakness. First, image recognition for machine learning, while it may have greatly advanced, still remains rudimentary. Papernot noted that to "teach" machines to recognize various images of cats and dogs, one needs to keep the parameters and the images fairly basic, introducing quite a bit of bias into the sample set.

Unfortunately, that makes the jobs of hackers much easier. Papernot pointed out that to disrupt these systems, which are often using publicly available images to learn, one doesn't need to hack into the actual machine learning system. An external party can detect that such a system in searching for such images to learn, and from there it's fairly easy to reverse-engineer the questions it's asking and the parameters it has set.

"You can choose the question the model is asking, and you find a way to make the model make the wrong prediction," he said. "You don't even need to have internal access. You can send the input, and see what prediction it's making, and extract the model. You can use that process to replicate the process locally."

From there, it's relatively straightforward to introduce some kind of deception that tricks the machine learning into learning all the wrong things.

"What this means is that an adversary really doesn't need to know anything about your model to attack," he said. "They just need to know what problem it is trying to solve. They don't need very many resources to steal your model and attack it."

Indeed, he said his own experiments with such extraction attacks found that they were successful up to 96% of the time. Of course, it's one thing if an automated system is mistaking a cat for a dog. It's another if it's the basis of a self-driving car algorithm that thinks a stop sign is a yield sign.

Of course, such attacks are being conducted in the physical world, with people placing marks on signs to trick self-driving cars. Recently, scientists at Northeastern University and the MIT-IBM Watson AI Lab, created an "adversarial t-shirt" that sported printed images to enable somebody to fool human detection systems.

While AI and ethics tends to get the most public attention, researchers are increasingly concerned about the issue of adversarial attacks. Atif said during his presentation that while the issue was first identified over a decade ago, the number of research papers dedicated to the topic has "exploded" since 2014. For the upcoming International Conference on Learning Representations, more than 120 papers on the topic have been submitted.

Atif said this growing interest is driven by a desire to find some kind of solution, which so far has remained elusive. Part of the problem is that while a machine learning system has to maintain a defined set of parameters, the variety of adversarial attacks is so extensive that there is no way to guess all the possible combinations and teach the system to defend itself.

Researchers have tried experiments such as separating a machine learning system into several buckets that perform the same task and then comparing the results. Or interpreting additional user behaviors such as which images get clicked on to determine whether an image had been read correctly. Atif said researchers are also exploring greater use of randomization and game theory in the hopes of finding more robust ways to defend the integrity of these systems.

So far, the most effective strategy is to augment a group of photos with examples of adversarial images to at least give the machine learning system some basic defense. At its best, such a strategy has gotten accuracy back up to only 45%.

"This is state of the art," he said. "We just do not have a powerful defense strategy."

More