To train a robot to navigate a house, you either need to give it a lot of real-time in many real homes or a lot of virtual time in a lot of virtual places. The latter is definitely the better option. Facebook and Matterport are working together to make thousands of virtual, interactive digital twins of natural spaces available for researchers and their voracious young AIs.
On Facebook’s side, the significant advance is in two parts: the new Habitat 2.0 training environment and the dataset they created to enable it. You may remember Habitat from a couple years back; in the pursuit of what it calls “embodied AI,” which is to say AI models that interact with the real world, Facebook assembled several passably photorealistic virtual environments for them to navigate.
Many robots and AIs have learned movement and object recognition in idealized, unrealistic spaces that resemble games more than reality. A real-world living room is a very different thing from a reconstructed one. An AI’s knowledge will transfer more readily to real-world applications like home robotics by learning to move about in something that looks like reality.
But ultimately, these environments were only polygon-deep, with minimal interaction and no actual physical simulation — if a robot bumps into a table, it doesn’t fall over and spill items everywhere. The robot could go to the kitchen, but it couldn’t open the fridge or pull something out of the sink. Habitat 2.0 and the new ReplicaCAD dataset change with increased interactivity and 3D objects instead of simply interpreted 3D surfaces.
Simulated robots in these new apartment-scale environments can roll around like before, but when they arrive at an object, they can actually do something with it. For instance, if a robot’s task is to pick up a fork from the dining room table and go place it in the sink, a couple years ago, picking up and putting down the knife would just be assumed since you couldn’t actually simulate it effectively. In the new Habitat system, the fork is physically affected, as is the table it’s on, the sink it’s going to, and so on. That makes it more computationally intense but also way more helpful.