Baidu details its adversarial toolbox for testing robustness of AI models

No matter the claimed robustness of AI and machine learning systems in production, none are immune to adversarial attacks, or techniques that attempt to fool algorithms through malicious input. It's been shown that generating even small perturbations on images can fool the best of classifiers with high probability. And that's problematic considering the wide proliferation of the "AI as a service" business model, where companies like Amazon, Google, Microsoft, Clarifai, and others have made systems that might be vulnerable to attack available to end users.

Researchers at tech giant Baidu propose a partial solution in a recent paper published on Arxiv.org: Advbox. They describe it as an open source toolbox for generating adversarial examples, and they say it's able to fool models in frameworks like Facebook's PyTorch and Caffe2, MxNet, Keras, Google's TensorFlow, and Baidu's own PaddlePaddle.

While the Advbox itself isn't new -- the initial release was over a year ago -- the paper dives into revealing technical detail.

AdvBox is based on Python, and it implements several common attacks that perform searches for adversarial samples. Each attack method uses a distance measure to quantify the size of adversarial perturbation, while a sub-model -- Perceptron, which supports image classification and object detection models as well as cloud APIs -- evaluates the robustness of a model to noise, blurring, brightness adjustments, rotations, and more.

AdvBox ships with tools for testing detection models susceptible to so-called adversarial t-shirts or facial recognition attacks. Plus, it offers access to Baidu's cloud-hosted deepfakes detection service via an included Python script.

"Small and often imperceptible perturbations to [input] are sufficient to fool the most powerful [AI]," wrote the coauthors. "Compared to previous work, our platform supports black box attacks ... as well as more attack scenarios."

Baidu isn't the only company publishing resources designed to help data scientists defend from adversarial attacks. Last year, IBM and MIT released a metric for estimating the robustness of machine learning and AI algorithms called Cross Lipschitz Extreme Value for Network Robustness, or CLEVER for short. And in April, IBM announced a developer kit called the Adversarial Robustness Toolbox, which includes code for measuring model vulnerability and suggests methods for protecting against runtime manipulation. Separately, researchers at the University of Tübingen in Germany created Foolbox, a Python library for generating over 20 different attacks against TensorFlow, Keras, and other frameworks.

But much work remains to be done. According to Jamal Atif, a professor at the Université Paris-Dauphine, the most effective defense strategy in the image classification domain -- augmenting a group of photos with examples of adversarial images -- at best has gotten accuracy back up to only 45%. "This is state of the art," he said during an address in Paris at the annual France is AI conference hosted by France Digitale. "We just do not have a powerful defense strategy."

More