Cohere partners with Google Cloud to train large language models using dedicated hardware

Google Cloud, Google's cloud computing services platform, today announced a multi-year collaboration with startup Cohere to "accelerate natural language processing (NLP) to businesses by making it more cost effective." Under the partnership, Google Cloud says it'll help Cohere establish computing infrastructure to power Cohere's API, enabling Cohere to train large language models on dedicated hardware.

The news comes a day after Cohere announced the general availability of its API, which lets customers access models that are fine-tuned for a range of natural language applications -- in some cases at a fraction of the cost of rival offerings. "Leading companies around the world are using AI to fundamentally transform their business processes and deliver more helpful customer experiences," Google Cloud CEO Thomas Kurian said in a statement. "Our work with Cohere will make it easier and more cost-effective for any organization to realize the possibilities of AI with powerful NLP services powered by Google's custom-designed [hardware]."

How Cohere runs

Headquartered in Toronto, Canada, Cohere was founded in 2019 by a pedigreed team including Aidan Gomez, Ivan Zhang, and Nick Frosst. Gomez, a former intern at Google Brain, coauthored the academic paper "Attention Is All You Need," which introduced the world to a fundamental AI model architecture called the Transformer. (Among other high-profile systems, OpenAI's GPT-3 and Codex are based on the Transformer architecture.) Zhang, alongside Gomez, is a contributor at FOR.ai, an open AI research collective involving data scientists and engineers. As for Frosst, he, like Gomez, worked at Google Brain, publishing research on machine learning alongside Turing Award winner Geoffrey Hinton.

In a vote of confidence, even before launching its commercial service, Cohere raised $40 million from institutional venture capitalists as well as Hinton, Google Cloud AI chief scientist Fei-Fei Li, UC Berkeley AI lab co-director Pieter Abbeel, and former Uber autonomous driving head Raquel Urtasun.

Unlike some of its competitors, Cohere offers two types of English NLP models, generation and representation, in Large, Medium, and Small sizes. The generation models can complete tasks involving generating text -- for example, writing product descriptions or extracting document metadata. By contrast, the representational models are about understanding language, driving apps like semantic search, chatbots, and sentiment analysis.

To keep its technology relatively affordable, Cohere charges access on a per-character basis based on the size of the model and the number of characters apps use (ranging from $0.0025-$0.12 per 10,000 characters for generation and $0.019 per 10,000 characters for representation). Only the generate models charge on input and output characters, while other models charge on output characters. All fine-tuned models, meanwhile -- i.e., models tailored to particular domains, industries, or scenarios -- are charged at two times the baseline model rate.

Large language models

The partnership with Google Cloud will grant Cohere access to dedicated fourth-generation tensor processing units (TPUs) running in Google Cloud instances. TPUs are custom chips developed specifically to accelerate AI training, powering products like Google Search, Google Photos, Google Translate, Google Assistant, Gmail, and Google Cloud AI APIs.

"The partnership will run until the end of 2024 with options to extend into 2025 and 2026. Google Cloud and Cohere have plans to partner on a go-to-market strategy," Gomez told VentureBeat via email. "We met with a number of Cloud providers and felt that Google Cloud was best positioned to meet our needs."

Cohere's decision to partner with Google Cloud reflects the logistical challenges of developing large language models. For example, Nvidia's recently released Megatron 530B model was originally trained across 560 Nvidia DGX A100 servers, each hosting 8 Nvidia A100 80GB GPUs. Microsoft and Nvidia say that they observed between 113 to 126 teraflops per second per GPU while training Megatron 530B, which would put the training cost in the millions of dollars. (A teraflop rating measures the performance of hardware, including GPUs.)

Inference -- actually running the trained model -- is another challenge. On two of its costly DGX SuperPod systems, Nvidia claims that inference (e.g., autocompleting a sentence) with Megatron 530B only takes half a second. But it can take over a minute on a CPU-based on-premises server. While cloud alternatives might be cheaper, they're not dramatically so -- one estimate pegs the cost of running GPT-3 on a single Amazon Web Services instance at a minimum of $87,000 per year.

Cohere rival OpenAI trains its large language models on an "AI supercomputer" hosted by Microsoft, which invested over $1 billion in the company in 2020, roughly $500 million of which came in the form of Azure compute credits.

Affordable NLP

In Cohere, Google Cloud -- which already offered a range of NLP services -- gains a customer in a market that's growing rapidly during the pandemic. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third -- 33% -- said that their spending climbed by more than 30%.

"We're dedicated to supporting companies, such as Cohere, through our advanced infrastructure offering in order to drive innovation in NLP," Google Cloud AI director of product management Craig Wiley told VentureBeat via email. "Our goal is always to provide the best pipeline tools for developers of NLP models. By bringing together the NLP expertise from both Cohere and Google Cloud, we are going to be able to provide customers with some pretty extraordinary outcomes."

The global NLP market is projected to be worth $2.53 billion by 2027, up from $703 million in 2020. And if the current trend holds, a substantial portion of that spending will be put toward cloud infrastructure -- benefiting Google Cloud.

How Cohere runs

Large language models

Affordable NLP

More