Presented by Appen

This article is fourth in a 5-part series on predictions in AI in 2021 — catch up on the first, second, and third in the series. 

Perhaps the most succinct summary of the relationship between artificial intelligence (AI) and data can be described as follows: an AI model is only as good as the data it was trained on. Training data serves as the foundation of AI solutions everywhere and can make or break their success.

Data management is a key focal point for companies building machine learning (ML) models, and this domain will only continue to grow in importance in 2021 and beyond. In the coming years, it will be more evident than ever how steep the price is of getting this area of AI wrong. In part four of our five part series on 2021 predictions, we focus on the shift in focus to diversification to avoid bias.

Preparing training data is already a time-consuming process — most AI teams spend about 80% of their time just on this task. It requires a not-insignificant investment of money and people to annotate the data. Organizations have a choice in whether they annotate their training data in-house or turn to a third-party vendor to handle the massive effort.

There are tradeoffs for each selection; using an in-house team to annotate datasets, for instance, can often result in less diverse perspectives and, therefore, more bias in the data. Using a third party vendor gives a company instant access to a large crowd of data annotators, but in some cases, less direct oversight into who these people are.

It’s a vital question more companies are starting to consider: who’s annotating our data? Are we incorporating a diverse collection of voices, or are we unintentionally introducing bias? Regardless of which data annotation method a company chooses, recognizing how data annotators play a critical role in influencing model bias will be paramount to success.

The role of data annotation in AI

While companies have traditionally focused on the money aspect of training data, it’s the people behind it gaining increased attention — as they should. These people, the data annotators, provide ground-truth accuracy and a global perspective to AI.

Data annotators undertake the most critical part of AI development, as the accuracy of their labels directly impacts the accuracy of the machine’s future predictions. A machine trained on poorly-labeled data will commit errors, make low-confidence predictions, and ultimately, not work effectively. The ramifications of poor data annotation can be enormous. Finance, retail, and other major industries rely on AI for various transactions, for example, and AI that’s not making accurate predictions will lead to poor customer experiences and impacts to business revenue.

These problems are almost always created in the data collection and annotation stages. For instance, the data used may not cover all potential use cases, or the people used to annotate it may only reflect a small demographic of end-users. Even the largest companies with the most resources don’t always get it right, and the impact on brand and customer experience can be ultimately traumatic. As companies continue to struggle to remove unintended biases from their models, we expect to see more examples of these kinds of failures. If anything, these examples will serve as a stark reminder of how costly it can be to not have a bias mitigation plan from the start.

How companies are reducing bias through a global AI economy

How are some companies successfully reducing bias in their models? In part, by focusing on their data annotators. Annotators play an essential role in mitigating bias in AI, which is especially important for products and services that operate in diverse markets. Building responsible AI, where bias is minimized, is mission-critical: after all, AI that doesn’t work for everyone, ultimately doesn’t work.

As the dialogue around responsible AI picks up steam in the next several years, expect organizations to zero in on reducing model bias further. Recall that AI training data prepared by humans can reflect their biases, which isn’t great for an algorithm’s objectivity. Solving for this bias requires including diverse perspectives from the beginning.

Luckily, companies are starting to leverage the power of the AI economy by utilizing crowds of data annotators and sourcing these contributors on a global scale. Access to a worldwide crowd brings in diverse ideas, opinions, and values. These diverse perspectives become reflected in training data and the AI solution itself, leading to a final product that’s less biased and more functional for everyone. The global crowd also provides unique expertise and skills that may not be present on a company’s existing team, enabling broader project scope. The globalization of the AI economy offers the perfect platform for data annotators to contribute needed impact.

As globalization continues, companies are becoming more cognizant of who they hire for annotation work and what type of diversity these individuals bring to the table. These factors are ideally covered in a comprehensive data management plan, one that should also include a protocol for data privacy and security.

As data becomes more accessible, and more organizations jump into the AI space, there will be more significant opportunities for successes — and failures. But with each new story, knowledge is gained. Getting the data part right will continue to be viewed as instrumental for profitability, and concerted data management efforts should result in more effective, less biased models in 2021 and the years to come.

At Appen, we have spent over 20 years annotating and collecting data using the best of breed technology platform and leveraging our diverse crowd to help ensure you can confidently deploy your AI models. To learn more about ethical considerations and our commitments when it comes to contractors annotating training data for AI, check out our Crowd Code of Ethics.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact