Foundation models risk exacerbating ML's ethical challenges

Machine learning is undergoing a paradigm shift with the rise of models trained at massive scale, including Google's BERT, OpenAI's DALL-E, and AI21 Labs' Jurassic-1 Jumbo. Their capabilities and dramatic performance improvements are leading to a new status quo: a single model trained on raw datasets that can be adapted for a wide range of applications. Indeed, OpenAI is reportedly developing a multimodal system trained on images, text, and other data using massive computational resources, which the company's leadership believes is the most promising path toward AGI -- AI that can learn any task a human can.

But while the emergence of these "foundational" models presents opportunities, it also poses risks, according to a new study released by the Stanford Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM). CFRM, a new initiative made up of an interdisciplinary team of roughly 160 students, faculty, and researchers, today published a deep dive into the legal ramifications, environmental and economic impact, and ethical issues surrounding foundation models. The report, whose coauthors include HAI codirector and former Google Cloud AI chief Fei Fei Li, examines existing challenges built into foundation models, the need for interdisciplinary collaboration, and why the industry should feel a grave sense of urgency.

"Foundation models ... are an emerging paradigm for building AI systems that lead to an unprecedented level of homogenization: a single model serving as the basis for a wide range of downstream applications," Percy Liang, Stanford HAI faculty and computer science professor, told VentureBeat via email. "This homogenization generates enormous leverage for many new applications, but they also pose clear risks such as the exacerbation of historical inequities and centralization of power."

Foundation models

CRFM's report defines foundation models as models adaptable to applications that are trained in a task-agnostic way on raw data. Theoretically, foundation models can process different modalities -- e.g., language and vision -- to affect the physical world and perform reasoning, or even interact with humans.

"The word 'foundation' specifies the role these models play: A foundation model is itself unfinished but serves as the common basis from which many task-specific models are built via adaptation," the report reads. "We also chose the term 'foundation' deliberately to communicate the gravity of these models: Poorly constructed foundations are a recipe for disaster, and well-executed foundations are reliable bedrock for future applications."

From a technical point of view, foundation models aren't new. They're based on deep neural networks and self-supervised learning, both of which have existed for decades. Semi-supervised learning accepts data that's partially labeled or where the majority of the data lacks labels. An algorithm determines the correlations between data points, using a small amount of labeled data to mark points and train based on the newly applied labels.

The sheer scope of foundation models over the last few years stretches the boundaries of what's possible, however. For example, OpenAI's GPT-3 can do a passable -- and occasionally exceptional -- job on challenging natural language tasks that it hasn't seen before. At the same time, existing foundation models have the potential to inflict harm and their characteristics are, in general, poorly understood.

"These models, which are trained at scale, result in emergent capabilities, making it difficult to understand what their biases and failure modes are. Yet the commercial incentives are for this technology to be deployed to society at large," Liang said.

Impacts

Foundation models are academically interesting, due to their stellar performance on popular benchmarks, but what makes them critical to study is the fact that they're being deployed with far-reaching consequences. For example, Google Search, which has 4 billion users, relies heavily on BERT. And GPT-3 is now being used in over 300 apps by "tens of thousands" of developers and producing 4.5 billion words per day.

As AI systems become deeply embedded in society, there have been growing concerns about their potential negative effects. Machine learning can perpetuate inequality as the trained models amplify biases in datasets. (Last year, an algorithm the U.K. government had adopted downgraded hundreds of thousands of students' grades, disproportionately impacting those from tuition-free schools.) Another concern is foundation models' ability to generate realistic text, images, and videos, which has the potential to scale disinformation in already polluted social media networks.

Foundation models could have other negative impacts, particularly from an environmental standpoint, the report's coauthors point out. The effects of model training on the environment have been brought into relief in recent years. In June 2020, researchers at the University of Massachusetts at Amherst released a study estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car. OpenAI itself has conceded that models like GPT-3 require significant amounts of compute -- on the order of hundreds of petaflops per day -- which contributes to carbon emissions.

Foundation models are also likely to have substantial labor market impacts and rest on tenuous legal footing. By 2022, an estimated 5 million jobs worldwide will be lost to automation technologies, with 47% of U.S. jobs at risk of being automated. Moreover, how the law bears on the development and deployment of foundational models remains unclear in the absence of unifying legal and regulatory frameworks.

It should be noted that preliminary work to address the liability questions is underway. Amsterdam and Helsinki have launched AI registries to detail how each city uses algorithms to deliver services. And the EU recently released tough draft rules on the use of AI, including a ban on most surveillance and strict safeguards for algorithms employed in recruitment, critical infrastructure, credit scoring, migration, and law enforcement.

Research ecosystem

Beyond the societal implications, foundation models introduce new hurdles in research and development, owing to the "strong economic incentives" companies have to deploy models developed for science. As an example, the coauthors cite GPT-3, which began as a research vehicle for OpenAI but later became a product widely used by software developers.

At the research community's peril, the distinction between theory and deployment is sometimes lost. Research models are "under construction" in the sense that they're often not extensively tested. Unfortunately, companies don't always place warning labels indicating this on their prototypes. To ensure safety, "many more" precautions should be taken when in-development models are made available commercially, the coauthors argue.

Taking the 10,000-foot-view, the coauthors note that while trained models may be available, the actual training of foundation models is impossible for the vast majority of AI researchers, due to their high computational cost and engineering requirements. This lack of accessibility -- and thus reproducibility -- risks hindering innovation and impacting the health of AI as a scientific field. It could also lead to a centralization of power among wealthier organizations, the coauthors say -- aside from community efforts like EleutherAI and Hugging Face's BigScience project.

"While some meaningful research can still be done with training smaller models or studying preexisting large models, neither will be sufficient to make adequate progress on the difficult sociotechnical challenges," the report reads. "Due to the emergent nature of these models, some functionalities like in-context learning have only been demonstrated in models of sufficient size. Having access to existing models can be useful for powering downstream applications or to identify problems (e.g., bias), but this will not help us design better architectures or training objectives for foundation models."

As an antidote to the many problematic aspects of foundation models, CRFM's report suggests building infrastructure for public AI projects like the Hubble Space Telescope and Large Hadron Collider. The coauthors point to the National Research Cloud, a nonprofit initiative to provide researchers with compute power and government datasets for education, as a step in the right direction. But they say that "much more" investment will be needed to fulfill the vision of an open community-based effort that shapes the future of foundation models.

"Much still remains unclear in spite of our efforts, and we reiterate that this is just the beginning of a paradigm shift: Foundation models have only just begun to transform the way AI systems are built and deployed in the world," the report's coauthors wrote. "To ensure the responsible development and deployment of these models on durable foundations, we envision collaboration between different sectors, institutions, and disciplines from the onset to be especially critical."

Liang added: "We’re very much in the early days so the professional norms are underdeveloped. It’s therefore imperative that we, as a community, act now to ensure that this technology is developed and deployed in an ethically and socially responsible fashion."