AI needs an open labeling platform

These days it’s hard to find a public company that isn’t talking up how artificial intelligence is transforming its business. From the obvious (Tesla using AI to improve auto-pilot performance) to the less obvious (Levis using AI to drive better product decisions), everyone wants in on AI.

To get there, however, organizations are going to need to get a lot smarter about data. To even get close to serious AI you need supervised learning which, in turn, depends on labeled data. Raw data must be painstakingly labeled before it can be used to power supervised learning models. This budget line item is big enough for C-suite attention. Executives that have spent the last 10 years stockpiling data and now need to turn that data into revenue face three choices:

1. DIY and build your own bespoke data labeling system. Be ready and budget for major investments in people, technology, and time to create a robust, production-grade system at scale that you will maintain in perpetuity. Sound straightforward? After all, that’s what Google and Facebook did. The same holds true for Pinterest, Uber, and other unicorns. But those aren’t good comps for you. Unlike you, they had battalions of PhDs and IT budgets the size of a small country’s GDP to build and maintain these complex labeling systems. Can your organization afford this ongoing investment, even if you have the talent and time to build a from-scratch production system at scale in the first place? If you’re the CIO, that’s sure to be a top MBO.

2. Outsource. There is nothing wrong with professional services partners, but you will still have to develop your own internal tooling. This choice takes your business into risky territory. Many providers of these solutions mingle third-party data with your own proprietary data to make N sample sizes much larger, theoretically resulting in better models. Do you have confidence in the audit trail of your own data to keep it proprietary throughout the entire lifecycle of your persistent data labeling requirements? Are the processes you develop as competitive differentiators in your AI journey repeatable and reliable -- even if your provider goes out of business? Your decade of hoarded IP -- data -- could possibly help enrich a competitor who is also building its systems with your partners. Scale.ai is the largest of these service companies, serving primarily the autonomous vehicle industry.

3. Use a training data platform (TDP). Relatively new to the market, these are solutions that provide a unified platform to aggregate all of the work of collecting, labeling, and feeding data into supervised learning models, or that help build the models themselves. This approach can help organizations of any size to standardize workflows in the same way that Salesforce and Hubspot have for managing customer relationships. Some of these platforms automate complex tasks using integrated machine learning algorithms, making the work easier still. Best of all, a TDP solution frees up expensive headcount, like data scientists, to spend time building the actual structures they were hired to create -- not to build and maintain complex and brittle bespoke systems. The purer TDP players include Labelbox, Alegion, and Superb.ai.

The future

Just as the shift in the 18th century to standardization and interchangeable parts ignited the Industrial Revolution, so, too, will a standard framework for defining TDPs begin to take AI to new levels. It is still early days, but it’s clear that labeled data -- managed through a true TDP -- can reliably turn raw data (your company’s precious IP) into a competitive advantage in almost any industry.

But C-suite executives need to understand the need for investing to tap the potential riches of AI. They have three choices today, and whichever decision they make, it will be expensive, whether it’s to build, outsource, or buy. As is often the case with key business infrastructure, there can be enormous hidden costs to building or outsourcing, especially when entering a new way of doing business. A true TDP “de-risks” that expensive decision while maintaining your company’s competitive moat, your IP.

(Disclosure: I work for AWS, but the views expressed here are mine.)

Matt Asay is a Principal at Amazon Web Services. He was formerly Head of Developer Ecosystem for Adobe and held roles at MongoDB, Nodeable (acquired by Appcelerator), mobile HTML5 start-up Strobe (acquired by Facebook);and Canonical. He is an emeritus board member of the Open Source Initiative (OSI) and a member of the Cognitive World Think Tank on enterprise AI.

VentureBeat is always looking for insightful guest posts from expert data and AI practioners.

The future

More