Meta describes how AI will unlock the metaverse

It's not news that Mark Zuckerberg wants to lead the charge in the emergent metaverse with his company, Meta (formerly Facebook). The recently concluded Meta event titled "Inside the lab: Building for the metaverse with AI" was another step in Meta's quest to unlock the metaverse with AI, after its previous announcement that it was developing a record-breaking supercomputer to power the metaverse. Experts have said that AI, VR, AR, blockchain and 5G will converge to power the metaverse, and Zuckerberg is keen on building several huge AI systems that will drive the nascent metaverse world.

"We work on a lot of different technologies here at Meta — everything from virtual reality to designing our own data centers. And we're particularly focused on foundational technologies that can make entirely new things possible. Today, we're going to focus on perhaps the most important foundational technology of our time: artificial intelligence," said Zuckerberg.

Zuckerberg says the kinds of experiences we’ll have in the metaverse are beyond what’s possible today, describing the metaverse as "an immersive version of the internet." He said the metaverse will require advances across a whole range of areas — from new hardware devices to software for building and exploring worlds — and AI is the key to unlocking a lot of these advances.

A new approach to self-supervised learning

Following Zuckerberg’s introductory note at the event, leader of Facebook AI, Jérôme Pesenti, and co-managing director at Facebook AI Research, Joelle Pineau, drilled into how Meta wants to unlock the metaverse with AI in a session titled "Unlocking the Metaverse with AI and Open Science." Pesenti noted that AI is one of the keys to the metaverse. He said the mission of Meta AI is to bring the world closer together by advancing AI through AI research breakthroughs and improving Meta products through them.

Pesenti said Meta AI is making significant advancements in critical areas like embodiment and robotics, creativity and self-supervised learning. Traditionally, self-supervised learning — where machines learn from direct human supervision — was achieved by teaching oriented systems to perform a single task by giving them lots of human-generated examples. However, the challenge with this approach, according to Pesenti, is that it’s task-dependent. In this approach, it isn’t clear when the machine really understands beyond the narrow task, and requires a lot of human labor that can introduce unwanted biases.

Pesenti said that Meta AI is moving to another self-supervised approach, where AI can learn data without any human supervision.

"For example, when dealing with language, the AI system can remove words from the input text and try to get them by inferring patterns in the surrounding words. As the AI system gets better, it also improves its understanding of the meaning and structure of language. This is one of the biggest advantages of this self-supervised model: It’s task-independent, such that a single model can be leveraged with minimal fine tuning to perform several downstream tasks. The model can help to do things like identifying hate speech, while also ensuring your news feed in search results won't run events."

Pesenti shared that with Meta AI’s research breakthrough, self-supervised learning is no longer limited to language. “In the past six months, researchers at Meta AI and in the rest of the industry have shown amazing results in understanding speech images as well," he said.

Meta AI researchers have managed to get self-supervised techniques that work remarkably well for images, where they take an image divided in small patches, ring 80% of these patches, and ask the AI to reconstruct the image, Pesenti said. He added that Meta AI researchers have shown that this new self-supervised technique, combined with a minimum amount of annotated data, is competitive against traditional approaches that use a lot more human supervision.

He said Meta AI is starting to create unified models that can understand multiple modalities at the same time: they can read lips while listening for better speech recognition, or identify policy-breaking social media posts by analyzing all the components — text, image or video — at the same time. But Meta AI won't stop there, according to Pesenti.

"We don't just want AI models that understand language, images and videos. We want AI models that understand the entire world around us. And with the advent of the metaverse, we have a unique challenge and a unique opportunity to make that happen."

The metaverse brings several new challenges

Pineau believes the metaverse ushers in various new challenges. She said most of the fast progress in AI of the last decade is deeply grounded in the internet, so it is not surprising that we have seen the most progress for data modalities such as speech, language and vision — which are the native modalities for the internet.

However, AR and VR present experiences and affordances that are different and much bigger. "For example, movement from hands to faces to the whole body becomes a major vector for giving and receiving information. This opens up some fascinating new opportunities and also requires some major progress in our AI models,” Pineau said

Although Pesenti shared the goal of building unified models, Pineau noted that it isn't quite sufficient, adding that it is essential to make progress on building world models. She said "building a world model" is a construct that AI researchers have talked about for years.

"The idea is to build a rich representation that can be used not just to make predictions,but also to roll forward the future and compare alternative choices of actions or interventions. As we move to building AI agents that can operate fluidly across true reality, augmented reality and virtual reality, our world models will need to be trained with a mix of static pre-recorded data, like the supervised models, but also a stream of interactive experiences," she said.

There is still uncertainty, as Pineau admits that Meta AI doesn't yet know all the new methods and algorithms that it will develop in coming years — but she noted that it already knows a few research directions are poised for big changes. One of such directions is embodiment and robotics. Pineau said Meta AI is looking at robotics because it is a fantastic case where world models can make a major difference. The focus, Pineau noted, is to achieve what is called "unbounded robotics" — robots that break out of the lab or highly constrained settings such as factories, and are able to operate fluidly in the home and office, interacting with humans and objects as naturally as possible.

"One important step as we build robots that learn from rich interaction is that we need the robot itself physically to improve its ability to perceive the world through touch."

Meta AI has been experimenting with new touch sensors, partnering with researchers at Carnegie Mellon University and MIT to create sensors that use AI techniques to infer contact location and measure contact forces through image changes recorded by a camera within the sensors, respectively. Compared with currently available commercial tactile sensors, Pineau said the digit sensor created in partnership with MIT is much cheaper to manufacture.

One of the challenges Meta AI wants to solve is creating models that can operate both in the real world through physical robots/objects and virtual worlds, allowing avatars to pick and manipulate objects in a realistic way — and ensuring consistency from one to the other. Meta AI recognizes a big gap between simulation and the real world, and is invested in bridging the gap from reality to VR, where it can train and test new algorithms for robot navigation and manipulation with realistic sensing and interaction with space and objects.

While Pineau agreed there is much work to do to build truly reliable world models, she noted there is an interesting question on whether it’s necessary for world models to be precise all the time. To answer this, Meta AI is developing a project that allows it to "lean into the inner child we all have inside of us and be creative," other than trying to sense and recreate the real world.

"This is just the beginning, and you can expect to see a lot more as we explore new ways that AI models can enhance human creativity," she said.

Open-sourcing its plans

Pineau said Meta AI will open source its plans, making it accessible to research teams around the world. "With most of our research work, we built and released an open source library — in this case, the PyTorch library — that includes several functionalities such as detecting touch slip, estimating, the robot paws, and the object itself, can all be included as part of a broader system with navigation and other robotics capabilities," she said.

As Meta embarks on a new journey to build AI for an "embodied interactive metaverse," Pineau noted that the company must raise the bar on how this is done, and what values it will promote in its design and technology. In agreement with Pineau, Pesenti said Meta will raise the bar through an unwavering commitment to create AI systems and technologies that follow the best practices, responsibilities, and models that are fair, inclusive, transparent and give users more control while protecting their privacy.

According to Pesenti, these best practices are not easy to define because the problems often involve complex societal problems. "This is why it is important for us to be transparent about our work and share it with the broader responsible AI community to get their feedback and leverage their expertise," he said.

In its journey toward what it calls "responsible AI," it seems Meta wants to address some of the privacy issues it has faced over the years by incorporating feedback from its open source community.

"We are also excited to announce that Meta AI is open-sourcing TorchRec, the recommendations library that powers many of our products. TorchRec demonstrates Meta AI's commitment to AI transparency and open science. It is available at the PyTorch library and provides common sparsity and parallelism primitives, enabling researchers to build the same state-of-the-art personalization that is used by Facebook newsfeed and by Instagram reels today. These are just a few concrete steps on a long journey towards more responsible AI," Pesenti said.

A new approach to self-supervised learning

The metaverse brings several new challenges

Open-sourcing its plans

More