In 2019, Facebook open-sourced AI Habitat, a simulator that can train AI systems embodying things like a home robot to operate in environments meant to mimic real-world settings, like apartments and offices. Today Facebook announced that it’s extended the capabilities of Habitat to make it “orders of magnitude” faster than other 3D simulators available, allowing researchers to perform more complex tasks in simulation, like setting the table and stocking the fridge. Coinciding with this, Facebook collaborated with 3D space capture company Matterport to open-source what it claims is the largest dataset of indoor 3D scans to date.
AI models in computer vision and natural language are typically trained with text, images, audios, and videos from the internet. But embodied AI — the development of systems with a physical or virtual embodiment, like robots — has different needs. Embodied AI tasks require systems to interact with the physical world, recognizing different objects from any angle to distinguish, for instance, between a countertop and a desk. In order to develop these sorts of robots and personal assistants safely, Facebook asserts, they have to be trained in rich, realistic simulated spaces.
The concept of embodied AI draws on embodied cognition, the theory that many features of psychology — human or otherwise — are shaped by aspects of the entire body of an organism. By applying this logic to AI, researchers hope to improve the performance of AI systems like chatbots, robots, autonomous vehicles, and even smart speakers that interact with their environments, people, and other AI. A truly embodied robot could check to see whether a door is locked, for instance, or retrieve a smartphone that’s ringing in an upstairs bedroom.
Habitat was designed to advance this, but the newest release — Habitat 2.0 — improves upon the original in key ways. It introduces ReplicaCAD, an artist-authored, reconfigurable 3D dataset of apartments matching real spaces with objects that can open and close, like cabinets and drawers. And it ships with the Home Assistant Benchmark, a suite of tasks for robots that test a range of manipulation capabilities including tidying the house, stocking groceries, and setting the table.
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
Facebook says that ReplicaCAD, which took professional 3D artists hired by Facebook over 900 hours to create, contains 11 layouts across 92 different objects spanning furniture, kitchen utensils, books, and more. As for the Home Assistant Benchmark, it requires robots to pick and place objects from receptacles like counters, sinks, and sofas, as well as open and close containers such as drawers and fridges as necessary.
Beyond the new dataset and benchmark, Habitat 2.0 is far more performance-friendly than the original, Facebook says, with speeds exceeding 26,000 simulation steps per second — 850 times real-time (30 steps per second) on an 8-GPU computer. Here, a step refers to rendering a single image and simulating rigid-body dynamics for 1/30th of a second. Facebook claims that’s 100 times faster than prior work, taking experimentation time from 6 months to under 2 days. For reference, existing simulators typically achieve 10 to 400 steps per second.
Habitat 2.0 makes certain sacrifices to achieve this speed, namely a lack of support for simulating the non-rigid dynamics of fluids, films, cloths, and ropes and physical state transformations such as cutting, drilling, welding, and melting. Moreover, its new dataset, ReplicaCAD, was only modeled on apartments in the U.S., excluding cultures and regions with different layouts and types of furniture and objects. But Facebook argues that these are worthwhile tradeoffs given that the performance enhancements “directly translate” to training time speedups and accuracy improvements from training models — particularly for object rearrangement tasks — with more experience.
“We aim to advance the entire ‘research stack’ for developing such embodied agents in simulation — curating house-scale interactive 3D assets … that support studying generalization to unseen objects, receptacles, and home layouts; developing the next generation of high-performance photo realistic 3D simulators that support rich interactive environments; [and] setting up challenging representative benchmarks to enable reproducible comparisons and systematic tracking of progress over the years,” the team behind Habitat 2.0 wrote in a paper describing the new simulator. “Coupled with the ReplicaCAD data, these improvements allow us to investigate the performance of [AI techniques] against classical … approaches for the suite of challenging rearrangement tasks we defined.”
Alongside Habitat 2.0, Facebook is releasing a dataset of 3D indoor scans co-created with Matterport: the Habitat-Matterport 3D Research Dataset (HM3D). It’s a collection of 1,000 Habitat-compatible scans made up of “accurately scaled” residential spaces such as apartments, multifamily housing, and single-family homes, as well as commercial spaces including office buildings and retail stores.
Dhruv Batra, a researcher at Facebook AI, believes that HM3D will play a “significant role” in advancing research in embodied AI. “With this dataset, embodied AI agents like home robots and AI assistants can be trained to understand the complexities of real-world environments, recognizing objects, rooms, and spaces, or learning to navigate and follow instructions — all in contexts that are vastly different from one another,” he wrote in a blog post. “To carry out complex tasks like finding misplaced objects or retrieving physical ones, an embodied AI agent needs to construct maps and episodic memory representations (to recall what it already has observed), understand speech and audio cues, and exhibit sophisticated motor control if it has to go up and down stairs.”
In the future, Facebook says it hopes to expand the dataset to include scans from more countries as well as annotations that might help AI to gain a high-level understanding tasks like object retrieval. Beyond this, building on its previous embodied AI research, the company aims to study changing environments so that simulations in simulators like Habitat 2.0 can become fluid rather than static.
“[Dynamic simulations] would bring simulated training environments closer to the real world, where people and pets freely move around and where everyday objects such as mobile phones, wallets, and shoes are not always in the same spot throughout the day,” Batra noted. “We believe that advancements in embodied AI could help developers build and train assistants with deep contextual understanding and give them the ability to navigate the world around them.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.