VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

Enterprises are increasing their investments in natural language processing (NLP), the subfield of linguistics, computer science, and AI concerned with how algorithms analyze large amounts of language data. According to a new survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

The goal of NLP is to develop models capable of “understanding” the contents of documents to extract information as well as categorize the documents themselves. Over the past decades, NLP has become a key tool in industries like health care and financial services, where it’s used to process patents, derive insights from scientific papers, recommend news articles, and more.

John Snow Labs’ and Gradient Flow’s 2021 NLP Industry Survey asked 655 technologists, about a quarter of which hold roles in technical leadership, about trends in NLP at their employers. The top four industries represented by respondents included health care (17%), technology (16%), education (15%), and financial services (7%). Fifty-four percent singled out named entity recognition (NER) as the primary use cases for NLP, while 46% cited document classification as their top use case. By contrast, in health care, entity linking and knowledge graphs (41%) were among the top use cases, followed by deidentification (39%).

NER, given a block of text, determines which items in the text map to proper names (like people or places) and what the type of each such name might be (person, location, organization). Entity linking selects the entity that’s referred to in context, like a celebrity or company, while knowledge graphs comprise a collection of interlinked descriptions of entities (usually objects or concepts).


AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.


Learn More

The big winners in the NLP boom are cloud service providers, which the majority of companies retain rather than develop their own in-house solutions. According to the survey, 83% of respondents said that they use cloud NLP APIs from Google Cloud, Amazon Web Services, Microsoft Azure, and IBM in addition to open source libraries. This represents a sizeable chunk of change, considering the fact that the global NLP market is expected to climb in value from $11.6 billion in 2020 to $35.1 billion by 2026. In 2019, IBM generated $303.8 million in revenue alone from its AI software platforms.

NLP challenges

Among the tech leaders John Snow Labs and Gradient Flow surveyed, accuracy (40%) was the most important requirement when evaluating an NLP solution, followed by production readiness (24%) and scalability (16%). But the respondents cited costs, maintenance, and data sharing as outstanding challenges.

As the report’s authors point out, experienced users of NLP tools and libraries understand that they often need to tune and customize models for their specific domains and applications. “General-purpose models tend to be trained on open datasets like Wikipedia or news sources or datasets used for benchmarking specific NLP tasks. For example, an NER model trained on news and media sources is likely to perform poorly when used in specific areas of healthcare or financial services,” the report reads.

But this process can become expensive. In an Anadot survey, 77% of companies with more than $2 million in cloud costs — which include API-based AI services like NLP — said they were surprised by how much they spent. As corporate investments in AI grows to $97.9 billion in 2023, according to IDC, Gartner anticipates that spending on cloud services will increase 18% this year to a total of $304.9 billion.

Looking ahead, John Snow Labs and Gradient Flow expect growth in question-answering and natural language generation NLP workloads powered by large language models like OpenAI’s GPT-3 and AI21’s Jurassic-1. It’s already happening to some degree. OpenAI says that its API, through which developers can access GPT-3, is currently used in more than 300 apps by tens of thousands of developers and producing 4.5 billion words per day.

The full results of the survey are scheduled to be presented at the upcoming NLP Summit, sponsored by John Snow Labs. “As we move into the next phase of NLP growth, it’s encouraging to see investments and use cases expanding, with mature organizations leading the way,” Dr. Ben Lorica, survey coauthor and external program chair at the NLP summit, said in a statement. “Coming off of the political and pandemic-driven uncertainty of last year, it’s exciting to see such progress and potential in the field that is still very much in its infancy.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.