Parallel Domain unveils Reactor, a generative AI-based synthetic data generation engine

Synthetic data platform Parallel Domain today announced the launch of Reactor, a state-of-the-art synthetic data generation engine that integrates advanced generative AI technologies with proprietary 3D simulation capabilities. The platform aims to provide machine learning (ML) developers with control and scalability, enabling them to generate fully annotated data that enhances AI performance and fosters the creation of safer and more resilient AI systems for real-world applications.

According to the company, Reactor enhances AI performance across various industries, such as autonomous vehicles and drones, by generating high-quality images. In addition, the tool harnesses the power of generative AI to produce annotated data, which is a crucial requirement for ML tasks.

By generating both bounding boxes (for object detection) and panoptic segmentation annotations (which provide comprehensive/panoramic views), Reactor ensures that AI models can effectively utilize visual data, resulting in more accurate and reliable outcomes.

"Our proprietary generative AI technology allows users to create and manipulate synthetic data using intuitive natural language prompts while also generating the corresponding labels required for training and testing ML models," Kevin McNamara, CEO and founder of Parallel Domain, told VentureBeat. "Reactor’s ability to generate diverse synthetic examples has led to significant performance improvements in tasks like pedestrian segmentation and debris and baby stroller detection. Its capacity to enhance dataset diversity, particularly for rare classes, contributes to the superior training of models."

Rapid ML model iteration and refinement

The company said its tool empowers users to create a wide range of synthetic data to train and test perception models. This is achieved by integrating Python and natural language, eliminating the need for time-consuming custom asset creation and streamlining workflow to improve efficiency. As a result, ML developers can rapidly iterate and refine their models, reducing turnaround time and accelerating AI development progress.

"Integrating these technologies into our platform allows users to generate data using Python and natural language commands, enhancing the flexibility of synthetic data generation," McNamara told VentureBeat. "Reactor equips ML developers with control and scalability, redefining the landscape of synthetic data generation. With Reactor, users can generate almost any asset in seconds using natural language prompts."

Leveraging generative AI to enhance synthetic data pipelines

According to McNamara, while other companies use generative AI to create visually appealing data, they are unusable for training ML models without annotations. Reactor overcomes this limitation by generating fully annotated data, which enhances the ML process and allows developers to create safer and more effective AI systems.

"We harness generative AI and 3D simulation to create a vast array of detailed, realistic synthetic data," McNamara told VentureBeat. "Generative AI enables the production of diverse scenarios and objects, while 3D simulation adds physical realism, ensuring the robustness of AI models trained on this data. Before now, generative models have struggled to understand what they’re generating, making them very poor at providing annotations such as bounding boxes and panoptic segmentation, which are crucial for training and testing AI models."

McNamara said that the tool provides a broad spectrum of data and scene customization options. In addition, its adaptive background creation feature allows for easy modification of generated scenes, enabling ML models to generalize across various environments. For instance, users can transform a suburban California setting into a bustling downtown Tokyo scene.

Intuitive image generation

Reactor’s natural language prompts introduce an intuitive way to generate image variations, according to McNamara. Users can modify existing images using simple prompts such as "make this image look like a snowstorm" or "put raindrops on the lens." This streamlined customization process eliminates the need to wait for custom asset creation, improving efficiency and turnaround time.

"The adaptive background creation feature in Reactor enriches the diversity of training environments for ML models," McNamara explained. "This broadens the scenarios the model can be trained on, helping it recognize and respond better to varying real-world conditions."

The generative architecture allows models to comprehend the structure of generated objects and underlying scenes, facilitating the extraction of pixel and spatial semantic understanding from layers in the generative process. This results in fully automatic and accurate annotations.

More diverse, realistic synthetic data

Using Python, users can flexibly configure their synthetic datasets by selecting various parameters such as locations (San Francisco, Tokyo), environments (urban, suburban, highway), weather conditions and agent distribution (pedestrians and vehicles).

Once the foundational dataset is configured, users can use Reactor to enhance their synthetic data with greater diversity and realism. By using natural language prompts, users can introduce a wide array of objects and scenarios into the scene, such as "garbage can," "cardboard box full of sunglasses spilling on the ground," "wooden crate of oranges" or "stroller."

Reactor generates synthetic data with essential annotations — including bounding boxes and panoptic segmentation — significantly speeding up ML model training and testing.

McNamara said the tool "revolutionizes" the traditional workflow of custom asset creation, which usually involves a time-consuming design process, manual configuration and integration by artists or developers.

"The generative AI-powered fast customization features improve efficiency and enhance turnaround times," McNamara added. "As a result, developers can create and integrate new assets into their synthetic datasets almost instantaneously, enabling faster iterations and continuous improvement of their models."

Detailed visual insights for autonomous vehicles

The company said it observed remarkable improvements in the safety of autonomous vehicles and automotive advanced driver assistance systems (ADAS). It also claimed that through advanced diffusion techniques, the tool recently achieved remarkable results in real-world scenarios.

Furthermore, the company highlighted that the tool recently significantly improved semantic segmentation results on the highly esteemed Cityscapes Dataset — a widely recognized benchmark for autonomous driving.

"Real-world data often lack sufficient training examples for these less common but crucially important objects," McNamara explained. "Reactor was employed to generate synthetic data depicting various scenarios involving strollers to bridge this gap. By introducing this synthetic data into the training sets, models could better learn and generalize the detection of strollers in real-world scenarios, thereby enhancing the safety of autonomous systems."

He added that for the Cityscapes dataset, synthetic instances of trains were generated by Reactor and introduced into the dataset.

"This enriched data resulted in improved model performance in detecting and segmenting trains, contributing to safer and more efficient autonomous driving systems," said McNamara.

He added that several of Parallel Domain’s customers have recently begun incorporating the Reactor capability into their AI development workflows. Although it is still in the early stages, the company is excited about Reactor’s potential for enhancing ML models.

“Both customers and the Parallel Domain ML team have trained models for cases that have significantly beaten previous baseline performance," said McNamara. "This is because Reactor’s variety of examples significantly boosts a dataset’s diversity. Diverse data trains great models, and we are redefining the landscape of synthetic data generation.”