Scale and nuTonomy release nuScenes, a self-driving dataset with over 1.4 million images

Datasets are the lifeblood of machine learning algorithms -- they "teach" artificial intelligence (AI) facts about the world, in a manner of speaking. And in domains such as autonomous driving, it's vitally important they're of the highest quality.

That's why nuTonomy today released a self-driving dataset called nuScenes that it claims surpasses in size and accuracy public datasets like KITTI, Baidu's ApolloScape, and the Udacity Self-Driving Car library. Scale, a San Francisco-based data labeling startup, provided annotations.

“We’re proud to provide the annotations ... as the most robust open source multi-sensor self-driving dataset ever released,” said Scale CEO Alexandr Wang. “We believe this will be an invaluable resource for researchers developing autonomous vehicle systems, and one that will help to shape and accelerate their production for years to come.”

NuTonomy compiled more than 1,000 scenes containing 1.4 million images, 400,000 sweeps of lidars (laser-based systems that judge the distance the distance between objects), and 1.1 million three-dimensional bounding boxes (objects detected with a combination of RGB cameras, radar, and lidar). They've been meticulously labeled through Scale's Sensor Fusion Annotation API, which taps AI and teams of humans for data annotation, and they are open-sourced starting this week.

Self-driving car datasets aren't exactly a rare commodity -- just this summer, Oregon-based Flir Systems released 10,000 labeled photos captured by its thermal camera system, Mapillary published 25,000 street-level images, and the University of California Berkeley uploaded 100,000 video sequences captured by RGB cameras. But Scale and nuTonomy claim that nuScenes is more comprehensive than any similar dataset that's come before it.

As the website explains, it used a combination of six cameras, one lidar, five radars, GPS, and an inertial measurement sensor to capture the nuScenes data. And driving routes in Singapore and Boston were specifically chosen to showcase "challenging" locations, times, and weather conditions.

Scale, which competes against the likes of Mighty AI, Appen, Cloud Factory, Samasource, and Amazon's Mechanical Turk, has labeled more than 200,000 million miles for clients that include Lyft, Voyage, General Motors, Zoox, and Embark since its founding in 2016. It recently expanded its work into robotics, drones, virtual assistants, and "other solutions" that depend heavily on AI, and in August Scale announced an $18 million funding round led by Index Ventures, with participation from Accel and Y Combinator.

The startup has raised $22.7 million to date and reports that revenue grew 15 times over the past year.

More