10 reasons to combine digital twins and synthetic data

Synthetic data and digital twins are complementary approaches for riffing on real-world data to improve AI and product design. Synthetic data tools generate labeled data for training AI from a small subset of real data. Digital twins generate “what-if” scenarios for evaluating various performance, cost, and sustainability trade-offs.

Digital twins could help extend synthetic data tools to support real-world digital transformation in construction, medicine, and supply chains. Conversely, synthetic data could help teams using digital twins simulate different scenarios more efficiently.

These tools tend to focus on different markets and use cases for the time being. Synthetic data tools tend to focus on improving AI development workflows. Digital twins’ capabilities tend to be infused into industry-specific applications for product development, manufacturing, construction, and medicine.

“Both synthetic data and digital twins, based on real-world data, will coexist and complement each other in specific cases,” Gaurav Gupta, partner and global head of Digital Engineering for technology research and advisory firm ISG, told VentureBeat. Synthetic data will be the preferred choice when constrained by cost, logistics, or privacy reasons, or when the real data is unpredictable or unavailable. Digital twins will be preferred for applications such as predictive maintenance that require a closed loop between the product-in-use and its digital counterpart.

But the two can also complement each other, he added. Synthetic data can augment digital twin applications, for example, where the model needs to be shared with different stakeholders, but without any of its private, sensitive, or classified information. In such cases, a synthetic data model can act as a proxy for the digital twin models. Similarly, and conversely, a synthetic data model, by definition, is supposed to be as close to the real-world scenario as possible. Hence, an existing digital twin model can feed into or accelerate the creation of synthetic models.

Here are the 10 ways synthetic data and digital twin capabilities complement each other in practice:

1. Improved decision making

Gil Elbaz, the chief technology officer, and cofounder of Datagen, a pioneer of Simulated Data for human-centric AI, told VentureBeat that synthetic data tools can inform three components that support better digital twins for decision-making. A synthetic motion generation component can help designers explore how people would walk down the crowded aisle of a plane, a busy street, a new bike lane, or in a store before it was built. A synthetic visual generation component can help designers explore visual parameters of a digital double to see how a new building may look at different times of the day, in different colors, and from other vantage points. A synthetic physics modeling component can help designers see how the sun falls on a rooftop, heats specific areas, and powers up solar panels. The next step is to simulate the impact of cell tower reception, earthquakes, and floods.

2. Urban planning

Kevin Saito, a senior product manager at Unity, told VentureBeat that synthetically created environments could allow architects to understand the impact that surrounding buildings will have on their facilities to more appropriately place windows or raise planning considerations. For example, Vu.City in the U.K. helps architects and city planners understand the three-dimensional aspects of projects in the fully integrated urban landscape.

Saito said the use cases drive different priorities. Digital twin simulation requires having something that’s going to interact with the scene, making physical properties important and visual fidelity, and as a result, lower visual fidelity may be preferred because it's cheaper to generate and run at scale. For synthetic data generation what’s important is that the scene matches the visual realism of the environment it’s meant to mimic. If you're generating synthetic data to train a computer vision model, visual fidelity is important, but you don’t need to mimic the actual physical properties of the environment. For example, material properties like weight, the center of mass, or friction that have no visual characteristics are unnecessary.

3. Generate new scenarios

Digital twin leader PTC defines a digital twin as a data-based representation of specific physical machines, people, and processes rather than simply digitally modeling a generic process or machine. Ed Cuoco, vice president of Analytics at PTC, told VentureBeat, “The concepts of synthetic data and digital twin are therefore intimately linked.”

Just as physical machines and processes produce data to represent performance, digital twins produce synthetic data to represent the simulation of that performance. As such, scenario generation is one of the primary use cases of the digital twin. “You could argue that scenario generation and resolution are almost certainly going to be a universal use case for the digital twin as it evolves,” Cuoco said.

4. Personalize medicine

Synthetic data could help overcome privacy challenges when using digital twins to improve healthcare. Unlike digital twins of airplanes or cities, the behavior of such models in precision medicine is more difficult to define and requires access to highly regulated healthcare data. “High-quality, synthetic healthcare data is emerging as a mechanism that enables such large datasets to be collected and merged, without significant privacy or governance hurdles, enabling digital twins in precision medicine to become a reality,” Syntegra cofounder and chief technology officer Ofer Mendelevitch told VentureBeat,

5. Validate medical models

Ben Alsdurf, a consultant at TLGG, a management consultancy, told VentureBeat that synthetic data tools and digital twins could be combined to generate test data sets that can expand and validate healthcare models. For example, synthetic data can help confirm whether a model developed for a specific population or demographic retains accuracy when applied to other populations.“While individualized research done in simulation is the holy grail for digital twins in health R&D, synthetic data may potentially offer more realistic use cases in the medium term,” he said.

6. Surface supply chain problems

Suketu Gandhi, partner and global leader for the digital supply chain at Kearney, a strategy and management consulting firm, says that supply chain digital twins could plan around novel supply chain stresses. While real-world data mostly keeps a steady rhythmic pace, synthetic data tools can introduce large ripples into the data.

“The key is understanding how to create truly random synthetic data, such as a dramatic shift in channel purchasing behavior on the part of consumers, or an increase in demand by 400 percent, to understand weaknesses in supply chain or customer service,” Gandhi said. This can help evaluate the impact of extreme conditions on the resiliency of all systems. Also, running thousands of scenarios can help flesh out opportunities not visible with real-world data, such as new product opportunities, competitive threats, and nontraditional partnerships.

7. Simulate failure at scale

Veritas has been combining synthetic data and digital twins to improve data protection. Veritas utilizes digital twins built on many years of telemetry data from more than 15,000 Veritas NetBackup Appliances. Both digital twins and synthetic data help train AI engines to predict failures before they occur and provide smart forecasting, enabling customers to not only better plan their infrastructure needs but also provide information on the reliability of their estate.

Eric Seidman, senior director at Veritas Technologies, told VentureBeat, “Combining synthetic data with digital twins helps our overall AI models to interpret factors that affect system performance and data reliability by detecting anomalies in both the appliances as well as in data.” The combination helps ensure system uptime and identify changes to data and parameters that could indicate malware or ransomware intrusion.

8. Improve 5G

Telcos are exploring how to combine digital twins and synthetic data to evaluate variations in equipment, placement, and protocols for new 5G deployments. Stephen Douglas, Head of 5G Strategy at Spirent Communications, said digital twins are currently being used to validate new 5G infrastructure before launch. These tools emulate network components for a new network component or system under test.

The digital twins use synthetic traffic to simulate the complex validation scenarios for the component under test. This synthetic traffic is based on real-world traffic captured and played back through the digital twin to see how the component copes under stress. They can also simulate impairments and cyber-attacks and create corner cases that are difficult to repeat in a real-world 5G network. He expects similar capabilities to improve operational networks in the future to identify, validate, and recommend optimal equipment settings in real-time.

9. Look for unexpected failure modes

Douglas said that the combination of synthetic data and digital twins could also improve performance and load testing for connected vehicles to help engineers understand how their systems will cope in future scenarios. For example, a connected autonomous vehicle may face impaired connectivity due to congestion in a crowded inner city, or the signal becomes weak or lost. Synthetic data variations and digital twin emulated replicas could help tease out unexpected problems to build more resilient systems.

10. Improve customer experience

Synthetic data can help extend customer experience data to digital twins of customer experience to improve product design, costs, or sales. “We use this frequently in enterprise software whereby customer data is turned into synthetic data that is then used to test the digital twin,” Vince Padua, CTO at Axway, told VentureBeat.

These digital twins reflect actual customer usage data of enterprise software products such as whether customers use a particular feature, how they decide to receive notifications from the product, or how they collaborate with other users while using the product. This usage data can be aggregated, anonymized, and synthesized to drive test automation, improve product roadmaps, and increase overall customer satisfaction. At times, the usage data can identify patterns that can be automated with AI or create a ‘digital twin’ of the customer experience whereby the AI can be given tasks to determine the fastest way to solve them.