OpenAI rival Cohere launches language model API

Cohere, a startup creating large language models to rival those from OpenAI and AI2Labs, today announced the general availability of its commercial platform for app and service development. Through an API, customers can access models fine-tuned for a range of natural language applications, in some cases at a fraction of the cost of rival offerings.

The pandemic has accelerated the world’s digital transformation, pushing businesses to become more reliant on software to streamline their processes. As a result, the demand for natural language technology is now higher than ever -- particularly in the enterprise. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their natural language processing (NLP) budgets grew by at least 10% compared to 2020, while a third -- 33% -- said that their spending climbed by more than 30%.

The global NLP market is expected to climb in value from $11.6 billion in 2020 to $35.1 billion by 2026.

"Language is essential to humanity and arguably its single greatest invention -- next to the development of computers. Ironically, computers still lack the ability to fully comprehend language, finding it difficult to parse the syntax, semantics, and context that all work together to give words meaning," Cohere CEO Aidan Gomez told VentureBeat via email. "However, the latest in NLP technology is continuously improving our ability to communicate seamlessly with computers."

Cohere

Headquartered in Toronto, Canada, Cohere was founded in 2019 by a pedigreed team including Gomez, Ivan Zhang, and Nick Frosst. Gomez, a former intern at Google Brain, coauthored the academic paper "Attention Is All You Need," which introduced the world to a fundamental AI model architecture called the Transformer. (Among other high-profile systems, OpenAI's GPT-3 and Codex are based on the Transformer architecture.) Zhang, alongside Gomez, is a contributor at FOR.ai, an open AI research collective involving data scientists and engineers. As for Frosst, he, like Gomez, worked at Google Brain, publishing research on machine learning alongside Turing Award winner Geoffrey Hinton.

In a vote of confidence, even before launching its commercial service, Cohere raised $40 million from institutional venture capitalists as well as Hinton, Google Cloud AI chief scientist Fei-Fei Li, UC Berkeley AI lab co-director Pieter Abbeel, and former Uber autonomous driving head Raquel Urtasun. "Very large language models are now giving computers a much better understanding of human communication. The team at Cohere is building technology that will make this revolution in natural language understanding much more widely available," Hinton said in a statement to Fast Company in September.

Unlike some of its competitors, Cohere offers two types of English NLP models, generation and representation, in languages that include Large, Medium, Small. The generation models can complete tasks involving generating text -- for example, writing product descriptions or extracting document metadata. By contrast, the representational models are about understanding language, driving apps like semantic search, chatbots, and sentiment analysis.

"By being in both [the generative and representative space], Cohere has the flexibility that many enterprise customers need, and can offer a range of model sizes that allow customers to choose the model that best fits their needs across the spectrums of latency and performance," Gomez said. "[Use] cases across industries include the ability to more accurately track and categorize spending, expedite data entry for medical providers, or leverage semantic search for legal cases, insurance policies and financial documents. Companies can easily generate product descriptions with minimal input, draft and analyze legal contracts, and analyze trends and sentiment to inform investment decisions."

To keep its technology relatively affordable, Cohere charges access on a per-character basis based on the size of the model and the number of characters apps use (ranging from $0.0025 to $0.12 per 10,000 characters for generation and $0.019 per 10,000 characters for representation). Only the generate models charge on input and output characters, while other models charge on output characters. All fine-tuned models, meanwhile -- i.e., models tailored to particular domains, industries, or scenarios -- are charged at two times the baseline model rate.

"The problem remains that the only companies able to capitalize on NLP technology require seemingly bottomless resources in order to access the technology for large language models -- which is due to the cost of these models ranging from the tens to hundreds of millions of dollars to build," Gomez said. "Cohere is easy-to-deploy. With just three lines of code, companies can apply [our] full-stack engine to power all their NLP needs. The models themselves are ... already pre-trained."

To Gomez's point, training and deploying large language models into production isn't an easy feat, even for enterprises with massive resources. For example, Nvidia's recently released Megatron 530B model was originally trained across 560 Nvidia DGX A100 servers, each hosting 8 Nvidia A100 80GB GPUs. Microsoft and Nvidia say that they observed between 113 to 126 teraflops per second per GPU while training Megatron 530B, which would put the training cost in the millions of dollars. (A teraflop rating measures the performance of hardware including GPUs.)

Inference -- actually running the trained model -- is another challenge. On two of its costly DGX SuperPod systems, Nvidia claims that inference (e.g., autocompleting a sentence) with Megatron 530B only takes half a second. But it can take over a minute on a CPU-based on-premises server. While cloud alternatives might be cheaper, they're not dramatically so -- one estimate pegs the cost of running GPT-3 on a single Amazon Web Services instance at a minimum of $87,000 per year.

Training the models

To build Cohere's models, Gomez says that the team scrapes the web and feeds billions of ebooks and web pages (e.g., WordPress, Tumblr, Stack Exchange, Genius, the BBC, Yahoo, and the New York Times) to the models so that they learn to understand the meaning and intent of language. (The training dataset for the generation models amounts to 200GB dataset after some filtering, while the dataset for the representation models, which wasn't filtered, totals 3TB.) Like all AI models, Cohere's trains by ingesting a set of examples to learn patterns among data points, like grammatical and syntactical rules.

It's well-established that models can amplify the biases in data on which they were trained. In a paper, the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claims that GPT-3 and similar models can generate text that might radicalize people into far-right extremist ideologies. A group at Georgetown University has used GPT-3 to generate misinformation, including stories around a false narrative, articles altered to push a bogus perspective, and tweets riffing on particular points of disinformation. Other studies, like one published by Intel, MIT, and Canadian AI initiative CIFAR researchers in April, have found high levels of stereotypical bias from some of the most popular open source models, including Google's BERT and XLNet and Facebook's RoBERTa.

Cohere, for its part, claims that it's committed to safety and trains its models "to minimize bias and toxicity." Customers must abide by the company's usage guidelines or risk having their access to the API revoked. And Cohere -- which has an external advisory council in addition to an internal safety team -- says that it plans to monitor "evolving risks" with tools designed to identify harmful outputs.

But Cohere's NLP models aren't perfect. In its documentation, the company admits that the models might generate "obscenities, sexually explicit content, and messages that mischaracterize or stereotype groups of people based on problematic historical biases perpetuated by internet communities." For example, when fed prompts about people, occupations, and political/religious ideologies, the API's output could be toxic 5 to 6 times per 1,000 generations and discuss men twice as much as it does women, Cohere says. Meanwhile, the Otter model in particular tends to associate men and women with stereotypically "male" and "female" occupations (e.g., male scientist versus female housekeeper).

In response, Gomez says that the Cohere team "puts substantial effort into filtering out toxic content and bad text," including running adversarial attacks and measuring the models against safety research benchmarks. "[F]iltration is done at the keyword and domain levels in order to minimize bias and toxicity," he added. "[The team has made] meaningful progress that sets Cohere apart from other [companies developing] large language models ... [W]e're confident in the impact it will have on the future of work over the course of this transformative era."

Cohere

Training the models

More