Motional releases expanded self-driving data set with over 1.4 billion annotated lidar points

Roughly a year ago, Scale and NuTonomy released a driverless data set called NuScenes that they claimed at the time surpassed corpora like KITTI, Baidu's ApolloScape, and the Udacity Self-Driving Car library in size, scale, and accuracy. Since then, new and more diverse corpora like the Waymo Open Dataset, the Ford Autonomous Vehicle Dataset, and Lyft's autonomous vehicle data set have emerged, but Motional -- whose CEO founded NuTonomy -- is looking to take back the crown with the release of an expanded NuScenes.

Data sets like NuScenes can be used to improve the robustness of self-driving cars in environments from cities to back roads. The Rand Corporation estimates that autonomous cars will have to rack up 11 billion miles before we'll have reliable statistics on their safety, but as headwinds slow real-world testing, simulated miles have become the next best thing.

This expansion of NuScenes includes NuScenes-lidarseg, which improves the semantic segmentation of 1,000 Singapore and Boston scenes, making it one of the largest publicly available lidar segmentation data sets. According to Motional, NuScenes-lidarseg adds 1.4 billion annotated lidar points for a "significantly" more detailed picture of a vehicle's surroundings than the original bounding boxes, allowing researchers to study things like lidar point cloud segmentation and foreground extraction.

The expanded data set also includes NuImages, a new corpora comprising nearly 100,000 annotated 2D images selected to represent a range of challenging, "educational" driving conditions. Motional says NuImages was created in response to user demand and that it is designed to help autonomous cars operate safely in "unpredictable" scenarios.

Both NuScene-lidarseg and NuImages build on the existing NuScenes data set, which contains hundreds of scenes comprising over a million images captured using cameras, lidars, radars, GPS, and inertial measurement sensors. Motional says over 8,000 researchers have used NuScenes since its release in March 2019, more than 10 new data sets have been made publicly available, and over 250 scientific papers have cited the data.

More