Meta details plans to build the metaverse (and put Siri and Alexa to shame)

In its deep-dive, two-hour-plus video explanation of how it sees the metaverse operating in the future, Meta offered 2,000-plus online listeners both high-level descriptions and details on several specific areas of this proposed new world. They included how the Facebook-led company is using AI and machine learning in the metaverse for research, product development, running a universal language translator, giving personal assistants human-level intelligence, and establishing responsible use of AI and all the personal data that goes with it.

[ Special Report -- The Metaverse: How close are we? ]

CEO Mark Zuckerberg led off with a 16-minute high-level overview of the day's session, noting several times that his company is placing high priority on building the Metaverse with a "responsible" approach to data stewardship, something which lost Facebook credibility in past years. Eight presentations followed in the 140-minute session.

How Meta plans to beat Siri, Alexa and Google

Personalized assistants that understand people and let them control the flow of conversation can make peoples' lives easier and pave the way to smarter devices — at home or on the go. But today, in 2022, they generally still leave a lot to be desired in terms of understanding requests, speed and accuracy of information.

"Assistants today — whether via voice or chat— are generally underwhelming," Meta conversational AI tech lead Alborz Geramifard said. "There are several reasons why, starting with how they are engineered. We're sharing the challenges developers and engineers face when attempting to build useful assistants and how we can navigate these challenges as we build for the metaverse."

Zuckerberg's hope for his company is to build a personal assistant that puts Siri, Alexa, and Google to shame. While Meta hasn't picked out a name for it yet, Zuckerberg said Meta wants its voice assistant to be more intuitive: picking up contextual clues in conversations, along with other data points that it can collect about our bodies, such as where our gaze is going, facial expressions, and hand gestures.

“To support true world creation and exploration, we need to advance beyond the current state of the art for smart assistants,” Zuckerberg said. “When we have glasses on our faces, that will be the first time an AI system will be able to really see the world from our perspective — see what we see, hear what we hear, and more. So, the ability and expectation we have for AI systems will be much higher.”

Meta's team appears to be up for those challenging tasks. During the presentation, Meta also introduced Project CAIRaoke, which Geramifard described as "breakthrough research that aims to make assistants more helpful and interactions with them more enjoyable. Project CAIRaoke is an AI model created for conversational agents. It works end-to-end, combining the four existing models typically used by today's assistants into a single, more efficient and flexible model."

Project CAIRaoke is leveraging years of advancement in natural language processing instead of scripted conversations delivered by applications that are deeply contextual and personalized, and the user is in charge of the conversation flow, Geramifard said.

"This unified system is better built for natural conversations and can adapt to their normal but complicated flows," Geramifard said.

Meta's coming universal language translator

One of the more substantial news items from the session was the introduction of Meta's universal language translator, which when it becomes widely used will enable much more than mere understanding between cultures – it will lead to an improved exchange of data, science, and business projects.

More than 2,000 languages and dialects are now being used each day somewhere in the world. The universal language translator will enable anybody to translate any language to one or more others, using AI and machine learning in real-time. Presently, only about 100 languages can be translated on a one-to-one basis, with English the most used by far.

The idea is to lessen the dominance of "majority languages," Meta's Angela Fan said, and to augment the value of lesser-known and used languages.

"Over 20% of the world's population is excluded from accessing information on the internet because they can’t read in a language that works for them," said Fan, herself a multilingual machine-learning translator specialist from Shanghai who has lived in Canada, the U.S., and France. "While access to technology is advancing, language and machine translation capabilities are limited. To ensure inclusion, we need to support everyone, regardless of the language they speak. Today, we will pull the curtain back on an ambitious body of work that aims to provide people with the opportunity to access anything on the internet in their native language and speak to anyone, regardless of their language preferences."

Some examples of the use of this translator, which is now in Meta development: In a marketplace in Kenya, vendors, artists, and customers from across Africa could negotiate easily in any of the many languages who can pay. An entrepreneur in China could learn from the same lectures making the rounds in other centers of technology.

In the future, AR glasses could translate instantly for an engineer talking with local techs in rural India speaking any language, including the dozen spoken there.

"Language is not just the sounds that we speak or the words that we write, but a fundamental connection of an individual to their family, their culture, and its history and traditions from generation to generation," Fan said. "Think about the music that you listen to, the holidays you might celebrate, or the food that you eat. Language serves as a foundation for our identity. Because it's one of the primary tools that we use to understand and then interact with the world around us."

Meta is working with partners that specialize in speech and audio to help make these advancements in translation technology, Fan said. The translator is estimated to be a few years from widespread operation.

Building responsible AI at Meta

AI is a core component of Meta's systems that does everything from ranking posts in Facebook's News Feed to tackling hate speech and misinformation. But, as with other emerging technologies, AI also raises many hard questions around issues such as privacy, fairness, accountability, and transparency.

Facebook supports more than 2.5 billion users worldwide and obviously cannot control the input of so many users. However, it has been criticized for deficiencies in this department over the last five years or so and is eager to improve its reputation there. Seems like a never-ending challenge, but Meta staff aren't shrinking from it.

"Our commitment is to building AI responsibly and using a cross-disciplinary team to support these efforts; we have tangible examples of how we are developing and testing approaches to help ensure that our machine learning (ML) systems are designed and used responsibly," Facebook AI Senior Program Manager Jacqueline Pan said.

Meta builds and tests approaches to help ensure that its machine-learning systems are designed and used responsibly, Pan said.

"Our mission is to ensure that AI and ML benefits people in society. Now this requires deep collaboration both internally and externally across a diverse set of teams, including parts of platform groups, policy and legal experts. Support from across the highest levels of mental leadership, and researchers who are really steeped in the larger community. We also develop our practices in regular consultation and collaboration with outside experts and regulators. And further we partner with impacted communities external experts in academic institutions, and industry stakeholders to understand the broader community's expectations when it comes to AI,” Pan said.

An example of Facebook's work in AI fairness this year, Pan said, was when the AI team collaborated with another internal team to release "casual conversation" datasets. "We built and released casual conversations in order to address the need for more high-quality datasets designed to help evaluate potential algorithmic biases in complex real-world AI systems," Pan said.

The dataset consisted of more than 45,000 videos of paid participants having non-scripted conversations; participants disclosed their age and gender, which allowed this dataset to be a relatively unbiased collection of age and gender samples. Additionally, the team was able to provide labels on skin tone and ambient lighting conditions. This data set is designed to help researchers evaluate their computer vision and audio models for accuracy across these dimensions, Pan said.

"With this data set, we hope to unlock more fairness measurements and research and bring the field one step closer to building fairer, more inclusive technologies," Pan said.