How to avoid buying AI-based marketing tools that are biased

In a previous post, I described how to make sure that marketers minimize bias when using AI. When bias sneaks in, it will significantly impact efficiency and ROAS. Hence, it’s critical for marketers to develop concrete steps to ensure minimal bias in the algorithms we use, whether it’s your own AI or AI solutions from third-party vendors.

In this post, we’re going to take the next step and document the specific questions to ask any AI vendor to make sure they’re minimizing bias. These questions can be part of an RFI (request for information) or RFP (request for proposal), and they can serve as a structured approach to periodic reviews of AI vendors.

Marketers’ relationships with AI vendors can take many forms, varying in terms of which building blocks of AI are in-house vs. external. On one end of the spectrum, marketers often leverage AI that’s entirely off-the-shelf from a vendor. For instance, marketers might run a campaign against an audience that’s pre-built within their DSP (demand-side platform), and that audience might be the result of a look-alike model based on a seed set of vendor-sourced audience data.

On the other end of the spectrum, marketers may choose to use their own training data set, do their own training and testing, and simply leverage an external tech platform to manage the process, or “BYOA” (“Bring Your Own Algorithm”, a growing trend) to a DSP. There are many flavors in between, such as providing marketers’ first-party data to a vendor to build a custom model.

The list of questions below is for the scenario in which a marketer is leveraging a fully-baked, off-the-shelf AI-powered product. That’s largely because these scenarios are the most likely to be offered to a marketer as a black box and thus come with the most uncertainty and potentially the most risk of undiagnosed bias. Black boxes are also harder to distinguish between, making vendor comparison very difficult.

But as you’ll see, all of these questions are relevant to any AI-based product no matter where it was built. So if parts of the AI building process are internal, these same questions are important to pose internally as part of that process.

Here are five questions to ask vendors to make sure they’re minimizing AI bias:

1. How do you know your training data is accurate?

When it comes to AI, garbage in, garbage out. Having excellent training data doesn’t necessarily mean excellent AI. However, having bad training data guarantees bad AI.

There are several reasons why certain data could be bad for training, but the most obvious is if it's inaccurate. Most marketers don’t realize how much inaccuracy exists in the datasets they rely on. In fact, the Advertising Research Foundation (ARF) just published a rare look into the accuracy of demographic data across the industry, and its findings are eye-opening. Industry-wide, data for “presence of children at home” is inaccurate 60% of the time, “single” marriage status is incorrect 76% of the time, and “small business ownership” is incorrect 83% of the time! To be clear, these are not results from models predicting these consumer designations; rather these are inaccuracies in the datasets that are presumably being used to train models!

Inaccurate training data confuses the process of algorithm development. For instance, let’s say an algorithm is optimizing dynamic creative elements for a travel campaign according to geographic location. If the training data is based on inaccurate location data (a very common occurrence with location data), it might for instance appear that a consumer in the Southwest of the US responded to an ad about a driving vacation to a Florida beach, or that a consumer in Seattle responded to a fishing trip in the Ozark mountains. That’s going to result in a very confused model of reality, and thus a suboptimal algorithm.

Never assume your data is accurate. Consider the source, compare it against other sources, check for consistency, and verify against truth sets whenever possible.

2. How do you know your training data is thorough and diverse?

Good training data also has to be thorough, meaning you need plenty of examples outlining all conceivable scenarios and outcomes you’re trying to drive. The more thorough, the more you can be confident about patterns you find.

This is particularly relevant for AI models built to optimize rare outcomes. Freemium mobile game download campaigns are a great example here. Games like these often rely on a small percentage of “whales'', users that buy a lot of in-game purchases, while other users buy few or none. To train an algorithm to find whales, it’s very important to make sure a dataset has a ton of examples of the consumer journey of whales, so the model can learn the pattern of who ends up being a whale. A training dataset is bound to be biased toward non-whales because they’re so much more common.

Another angle to add to this is diversity. If you’re using AI to market a new product, for instance, your training data is likely to be made up mostly of early adopters, who may skew certain ways in terms of HHI (household income), lifecycle, age, and other factors. As you try to “cross the chasm” with your product to a more mainstream consumer audience, it’s critical to ensure you have a diverse training data set that includes not just early adopters but also an audience that’s more representative of later adopters.

3. What testing has been done?

Many companies focus their AI testing on overall algorithm success, such as accuracy or precision. Certainly, that’s important. But for bias specifically, testing can’t stop there. One great way to test for bias is to document specific subgroups that are key to primary use cases for an algorithm. For example, if an algorithm is set up to optimize for conversion, we might want to run separate tests for big ticket items vs. small ticket items, or new customers vs. existing customers, or different types of creative. Once we have that list of subgroups, we need to track the same set of algorithm success metrics for each individual subgroup, to find out where the algorithm performs significantly weaker than it does overall.

The recent IAB (Interactive Advertising Bureau) report on AI Bias offers a thorough infographic to walk marketers through a decision tree process for this subgroup testing methodology.

4. Can we run our own test?

If a marketer is using a vendor’s tool, it’s highly recommended not just to trust that vendor’s tests but to run your own, using a few key subgroups that are critical to your business specifically.

It’s key to track algorithm performance across subgroups. It’s unlikely performance will be identical between them. If it isn't, can you live with the different levels of performance? Should the algorithm only be used for certain subgroups or use cases?

5. Have you tested for bias on both sides?

When I think of potential implications of AI bias, I see risk both for inputs into an algorithm and outputs.

In terms of inputs, imagine using a conversion optimization algorithm for a high-consideration product and a low-consideration product.

An algorithm may be far more successful at optimizing for low-consideration products because all consumer decisioning is done online and thus there’s a more direct path to purchase.

For a high-consideration product, consumers may research offline, visit a store, talk to friends, and thus there’s a much less direct digital path to purchase, and thus an algorithm may be less accurate for these types of campaigns.

In terms of outputs, imagine a mobile commerce campaign optimized for conversion. An AI engine is likely to generate far more training data from short tail apps (such as ESPN or Words With Friends) than from long tail apps. Thus, it’s possible an algorithm may steer a campaign toward more short-tail inventory because it has better data on these apps and thus is better able to find patterns of performance. A marketer may find over time his or her campaign is over-indexing with expensive short tail inventory and potentially losing out on what could be very efficient longer tail inventory.

The bottom line

The list of questions above can help you either develop or fine-tune your AI efforts to have as little bias as possible. In a world that’s more diverse than ever, it’s imperative that your AI solution reflects that. Incomplete training data, or insufficient testing, will lead to suboptimal performance, and it’s important to remember that bias testing is something that should be systematically repeated as long as an algorithm is in use.

Jake Moskowitz is Vice President of Data Strategy and Head of the Emodo Institute at Ericsson Emodo.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!