Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

For decades, enterprises have jury-rigged software designed for structured data when trying to solve unstructured, text-based data problems. Although these solutions performed poorly, there was nothing else. Recently, though, machine learning (ML) has improved significantly at understanding natural language.

Unsurprisingly, Silicon Valley is in a mad dash to build market-leading offerings for this new opportunity. Khosla Ventures thinks natural language processing (NLP) is the most important technology trend of the next five years. If the 2000s were about becoming a big data-enabled enterprise, and the 2010s were about becoming a data science-enabled enterprise — then the 2020s are about becoming a natural language-enabled enterprise.

To fast-track its transformation to such an enterprise, an organization must establish a viable strategy that aligns with its business objectives and generates business impact. While it may sound like a complex decision that requires an expensive management consulting firm, it’s not. It starts with how you answer two questions: First, who employs the data scientists and machine learning engineers (MLEs)? Second, who builds and operates the underlying ML stack that houses the relevant models and tools?

 Strategy Option 1Strategy Option 2
 Company employs MLEs and Data ScientistsVendor employs MLEs and Data Scientists
Vendor manages ML stackLow-code ML platform and Pre-trained modelsAPIs
Company manages ML stackBuild your own using open-source elements 

NLP solutions: Building mature AI

A “build your own” strategy allows companies to construct custom ML models on their data. It also minimizes security risks because companies don’t have to share data with external vendors to label or process. If you can pull it off and afford it, “build your own” leads to substantial competitive advantages because you now have a world-class artificial intelligence (AI) team, amplifying productivity in every aspect of the business.


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

However, this strategy is by far the most expensive. Building and operating an ML stack is complicated and requires specialized expertise. KPMG estimates that to build mature AI capabilities, a company needs to employ at least 500 to 600 full-time AI employees — including a majority who build and operate the ML stack — and pay them a cumulative $100 million to $120 million per year. On top of that, there is no guarantee for success, since productionizing AI is challenging for even the best teams.

The “low-code ML platform and pre-trained models” strategy reduces the cost of building mature AI capabilities because the vendor handles the majority of the development and operation of the ML stack. Instead of spending more than $100 million per year, organizations can likely reduce that to $25 million to $50 million annually. This strategy also still allows companies to build custom ML and NLP models.

Though, like the previous strategy, there is no guarantee of success because it does not eliminate one of the most complex parts of the full AI process. That is — the handoff of models from the AI team to the business team to actually implement them into production and derive business value.

An application programming interface (API) strategy minimizes the hand-off problem, increasing the probability of success in productionizing AI. ML models can be seamlessly integrated into applications because the vendor abstracts the complexity of creating and training these models, and guides the users into the best way of using them. It also reduces the cost of achieving the benefits of NLP since the vendor employs the data scientists and MLEs, and builds and operates the ML stack.

Models that are accessible via APIs are built on public datasets and must still be trained and tuned to work on domain and company-specific data. However, if the vendor has implemented the tool properly, this work can be done directly by domain experts without technical skills.

Unfortunately, most vendors have not solved this problem, so there is limited feasibility of re-training their large language models to work on customer data without hiring a full staff of MLEs and data scientists to train and maintain over time; it either works or it doesn’t.

Where does this leave us?

For most enterprises, the best approach to leveraging NLP and becoming a natural language-enabled enterprise would be a strategy that includes APIs. That is — provided that the vendor has enabled the capability for the customer to easily tune and optimize its general-purpose model so it can work on customer data. This would save enterprises tens of millions of dollars every year and accelerate time-to-value.

To the extent that the use case calls for a model that can’t be accessed via API and easily tuned, then the next best strategy for most enterprises is the “low-code ML platform and pre-trained models” strategy. While the build-your-own strategy is the least practical strategy for most enterprises, there are, of course, a few companies for which this is the best path to action.

After all, according to Gartner: “Enterprises sit on unexploited unstructured data, with opportunities to extract differentiating insights. Data and analytics technical professionals must uncover such insights by applying natural language technology solutions: intelligent document processing, conversational AI and insight engines.”

Ryan Welsh is the founder and CEO of Kyndi.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers