This week, Ford quietly released a corpus — the Ford Autonomous Vehicle Dataset — containing data collected from its fleet of autonomous cars in the Greater Detroit Area.  The data set, which is freely available to researchers, could be used to improve the robustness of self-driving cars in urban environments.

To create the data set, engineers drove Ford Fusion Hybrids equipped with four quad-core Intel i7 processors and 16GB of RAM roughly 66 kilometers (41 miles) to and from the Detroit Metropolitan Wayne County Airpot, the University of Michigan Dearborn campus, and residential communities. Minor variations to the route were introduced to capture a “diverse” set of features. Data was chiefly captured with four lidar sensors (which measure the distance to a target by illuminating the target with laser light and measuring the reflected light), six 1.3-megapixel cameras, one 5-megapixel camera, and an inertial measurement unit.

From the sensor readings, Ford researchers generated maps and pedestrian pose data to ship with the corpus, including 3D ground reflectivity maps, 3D point cloud maps, six-degree-of-freedom ground truth poses, and localized pose sensor information. These reflect seasonal differences — data was captured in sunny, snowy, and cloudy conditions, as well as during the fall — and cover a range of driving environments, including freeways, overpasses, airports, bridges, tunnels, construction zones, and types of vegetation.

Ford autonomous vehicles data set

Above: The route captured by Ford’s vehicles.

Image Credit: Ford

Ford notes that each log in the Ford Autonomous Vehicle Dataset is time-stamped and contains raw data from the sensors, calibration values, pose trajectory, ground truth pose, and 3D maps. It’s available in ROS bag file format, which enables it to be visualized, modified, and applied using the open source Robot Operating System (ROS).

“This … data set can provide a basis to enhance state-of-the-art robotics algorithms related to multi-agent autonomous systems and make them more robust to seasonal and urban variations,” wrote the Ford researchers who contributed to the data set in a preprint paper accompanying its release. “We hope that this data set will be very useful to the robotics and AI community and will provide new research opportunities in collaborative autonomous driving.”

The release of Ford’s data set comes after an update to a similar corpus from Waymo — the Waymo Open Dataset — and after Lyft open-sourced its own data set for autonomous vehicle development. Other such corpora include nuScenes; Mapillary Vistas’ corpus of street-level imagery; the Canadian Adverse Driving Conditions (CADC); the KITTI collection for mobile robotics and autonomous driving research; and the Cityscapes data set developed and maintained by Daimler, the Max Planck Institute for Informatics, and the TU Darmstadt Visual Inference Group.

Ford autonomous vehicles data set

Above: The cars used to create the data set.

Image Credit: Ford

As part of a $900 million investment in its Michigan manufacturing footprint that was announced two years ago, Ford said in March 2019 that it would build a new factory dedicated to the production of autonomous vehicles. Last July, the automaker revealed it would create a separate $4 billion Detroit-based unit to house the research, engineering, systems integration, business strategy, and development operations for its self-driving vehicle fleet.

Ford recently acquired autonomous systems developer Quantum Signal to bolster its driverless vehicle efforts, and it has a close relationship with Pittsburgh-based Argo AI, which it pledged to invest $1 billion in over the next five years. Argo had been testing autonomous cars in Pittsburgh, Pennsylvania; Austin, Texas; Miami, Florida; Palo Alto, California; Washington, D.C.; and Dearborn, Michigan before the spread of COVID-19 forced it to pause this testing indefinitely.