Naver trained a 'GPT-3-like' Korean language model

Naver, a Seongnam, South Korea-based company that operates an eponymous search engine, this week announced it has trained one of the largest AI language models of its kind, called HyperCLOVA. Naver claims the system learned 6,500 times more Korean data than OpenAI's GPT-3 and contains 204 billion parameters, the parts of the machine learning model learned from historical training data. (GPT-3 has 175 billion parameters.)

For the better part of a year, OpenAI's GPT-3 has remained among the largest AI language models ever created. Via an API, people have used it to automatically write emails and articles, summarize text, compose poetry and recipes, create website layouts, and generate code for deep learning in Python. But GPT-3 has key limitations, chief among them that it's only available in English.

According to Naver, HyperCLOVA was trained on 560 billion tokens of Korean data -- 97% of the Korean language -- compared with the 499 billion tokens on which GPT-3 was trained. Tokens, a way of separating pieces of text into smaller units in natural language, can be either words, characters, or parts of words.

In a translated press release, Naver said it will use HyperCLOVA to provide "differentiated" experiences across its services, including the Naver search engine's autocorrect feature. "Naver plans to support HyperCLOVA [for] small and medium-sized businesses, creators, and startups," the company said. "Since AI can be operated with a few-shot learning method that provides simple explanations and examples, anyone who is not an AI expert can easily create AI services."

OpenAI policy director Jack Clark called HyperCLOVA a "notable" achievement because of the scale of the model and because it fits into the trend of generative model diffusion, with multiple actors developing "GPT-3-style" models. In April, a research team at Chinese company Huawei quietly detailed PanGu-Alpha (stylized PanGu-α), a 750-gigabyte model with up to 200 billion parameters that was trained on 1.1 terabytes of Chinese-language ebooks, encyclopedias, news, social media, and web pages.

"Generative models ultimately reflect and magnify the data they're trained on -- so different nations care a lot about how their own culture is represented in these models. Therefore, the Naver announcement is part of a general trend of different nations asserting their own AI capacity [and] capability via training frontier models like GPT-3," Clark wrote in his weekly Import AI newsletter. "[We'll] await more technical details to see if [it's] truly comparable to GPT-3."

Skepticism

Some experts believe that while HyperCLOVA, GPT-3, PanGu-α, and similarly large models are impressive with respect to performance, they don't move the ball forward on the research side of the equation. Instead, they're prestige projects that demonstrate the scalability of existing techniques or serve as a showcase for a company's products.

Naver does not claim that HyperCLOVA overcomes other blockers in natural language, like answering math problems correctly or responding to questions without paraphrasing training data. More problematically, there's also the possibility that HyperCLOVA contains the types of bias and toxicity found in models like GPT-3. Among others, leading AI researcher Timnit Gebru has questioned the wisdom of building large language models -- examining who benefits from them and who is harmed. The effects of AI and machine learning model training on the environment have also been raised as serious concerns.

The coauthors of the OpenAI and Stanford paper suggest ways to address the negative consequences of large language models, such as enacting laws that require companies to acknowledge when text is generated by AI -- possibly along the lines of California's bot law.

Other recommendations include:

Training a separate model that acts as a filter for content generated by a language model
Deploying a suite of bias tests to run models through before allowing people to use the model
Avoiding some specific use cases

The consequences of failing to take any of these steps could be catastrophic over the long term. In recent research, the Middlebury Institute of International Studies' Center on Terrorism, Extremism, and Counterterrorism claims GPT-3 could reliably generate "informational" and "influential" text that might radicalize people into violent far-right extremist ideologies and behaviors. And toxic language models deployed into production might struggle to understand aspects of minority languages and dialects. This could force people using the models to switch to "white-aligned English," for example, to ensure the models work better for them, or discourage minority speakers from engaging with the models at all.