Why the AGI discussion is getting heated again

Every once in a while, arguments resurface about artificial general intelligence (AGI) being right around the corner. And right now, we are in the midst of one of those cycles. Tech entrepreneurs are warning about the alien invasion of AGI. The media is awash with reports of AI systems that are mastering language and moving toward generalization. And social media is filled with heated discussions about deep neural networks and consciousness.

Recent years have seen some truly impressive advances in AI, and scientists have been able to make progress in some of the most challenging areas of the field.

But as has happened multiple times during the decades-long history of AI, part of the current rhetoric around AI advances might be unjustified hype. And there are areas of research that haven’t gotten much attention, partly because of the growing influence of big tech companies on artificial intelligence.

Overcoming the limits of deep learning

In the early 2010s, a group of researchers managed to win the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), by a wide margin, using a deep learning model. Since then, deep learning has become the main focus of AI research.

Deep learning has managed to make progress on many tasks that were previously very challenging for computers, including image classification, object detection, speech recognition and natural language processing.

However, the growing interest in deep learning also highlighted some of its shortcomings, including its limited generalizability, struggles with causality and lack of interpretability. Moreover, most deep learning applications required tons of manually annotated training examples, which became a bottleneck.

Recent years have seen interesting advances in some of these areas. One key innovation has been the transformer model, a deep learning architecture introduced in 2017. One important characteristic of transformers is their capacity to scale. Researchers have shown that the performance of transformer models continues to improve as they grow larger and trained on more data. Transformers can also be pre-trained through unsupervised or self-supervised learning, which means they can use terabytes of unlabeled data available on the internet.

Transformers have given rise to a generation of large language models (LLMs) such as OpenAI’s GPT-3, DeepMind’s Gopher and Google’s PaLM. In some cases, researchers have shown that LLMs can perform many tasks without extra training or with very few training examples (also called zero-, one-, or few-shot learning). While transformers were initially designed for language tasks, they have expanded to other fields, including computer vision, speech recognition, drug research and source code generation.

More recent work has been focused on bringing together multiple modalities. For example, the CLIP, a deep learning architecture developed by researchers at OpenAI, trains a model to find relations between text and images. Instead of carefully annotated images used in earlier deep learning models, CLIP is trained on images and captions that are abundantly available on the internet. This enables it to learn a wide range of vision and language tasks. CLIP is the architecture used in OpenAI’s DALL-E 2, an AI system that can create stunning images from text descriptions. DALL-E 2 seems to have overcome some of the limits of previous generative DL models, including semantic consistency (i.e., understanding the relationship between different objects in an image).

Gato, DeepMind’s latest AI system, takes the multimodal approach one step further by bringing text, images, proprioceptive information and other types of data into a single transformer model. Gato uses one model to learn and perform many tasks, including playing Atari, captioning images, chatting and stacking blocks with a real robot arm. The model has mediocre performance on many of the tasks, but DeepMind’s researchers believe that it is only a matter of time before an AI system like Gato can do it all. The research director of DeepMind recently tweeted, “It’s all about scale now! The Game is Over!” implying that creating larger versions of Gato will eventually reach general intelligence.

Is deep learning the final answer to AGI?

Recent advances in deep learning seem to be in line with the vision of its main proponents. Geoffrey Hinton, Yoshua Bengio and Yann LeCun, three Turing Award–winning scientists known for their pioneering contribution to deep learning, have suggested that better neural network architectures will eventually overcome the current limits of deep learning. LeCun, in particular, is an advocate of self-supervised learning, which is now broadly used in the training of transformers and CLIP models (though LeCun is working on a more sophisticated kind of self-supervised learning, and it is worth noting that LeCun has a nuanced view on the topic of AGI intelligence and prefers the term “human-level intelligence”).

On the other hand, some scientists point out that despite its advances, deep learning still lacks some of the most essential aspects of intelligence. Among them are Gary Marcus and Emily M. Bender, both of whom have thoroughly documented the limits of large language models such as GPT-3 and text-to-image generators such as DALL-E 2.

Marcus, who has written a book on the limits of deep learning, is among a group of scientists who endorse a hybrid approach that brings together different AI techniques. One hybrid approach that has recently gained traction is neuro-symbolic AI, which combines artificial neural networks with symbolic systems, a branch of AI that fell by the wayside with the rise of deep learning.

There are several projects that show neuro-symbolic systems address some of the limits that current AI systems suffer from, including lack of common sense and causality, compositionality and intuitive physics. Neuro-symbolic systems have also proved to require much less data and compute resources than pure deep learning systems.

The role of big tech

The drive toward solving AI’s problems with bigger deep learning models has increased the power of companies that can afford the growing costs of research.

In recent years, AI researchers and research labs have gravitated toward large tech companies with deep pockets. The UK-based DeepMind was acquired by Google in 2014 for $600 million. OpenAI, which started out as a nonprofit research lab in 2015, switched to a capped-profit outfit in 2019 and received $1 billion in funding from Microsoft. Today, OpenAI no longer releases its AI models as open-source projects and has licensed them exclusively to Microsoft. Other big tech companies such as Facebook, Amazon, Apple and Nvidia have set up their own cash-burning AI research labs and are using lucrative salaries to snatch scientists from academia and smaller organizations.

This, in turn, has given these companies the power to steer AI research in directions that give them the advantage (i.e., large and expensive deep learning models that only they can fund). Although the wealth of big tech has helped immensely advance deep learning, it has come at the expense of other fields of research such as neuro-symbolic AI.

Nonetheless, for the moment, it seems that throwing more data and compute power at transformers and other deep learning models is still yielding results. It will be interesting to see how far the notion can be stretched and how close it will bring us toward solving the ever-elusive enigma of thinking machines.