Synthetic data and the Wells Fargo-Hazy relationship

Over the years, legacy financial institutions have been generating and sitting on a wealth of useful data. Unfortunately, strict privacy and security controls have limited how these institutions can use their own data. Such constraints are a problem because they block innovation.

To keep market share and stay ahead of the pack, legacy financial institutions need to figure out an alternative. Synthetic data is emerging as the solution to financial institutions hamstrung by silos and governance protocols.

A synthetic dataset preserves the statistical properties of its real-life equivalent but loses real information that can compromise privacy. As a result, synthetic data finds use in studying trends and anomaly detection, which are a few bread-and-butter use cases of machine learning (ML) algorithms.

Harry Keen, CEO of synthetic data-generating platform Hazy, points out that creating a synthetic dataset involves training a generative model to “learn all the statistical characteristics in the raw data and generate row upon row of fictitious data points.”

“You can use it exactly the same way as the real dataset in many use cases across the enterprise,” Keen said. And that is a golden ticket that financial institutions are banking on. Wells Fargo, for example, has added Hazy to its startup accelerator program, its way of bringing in early-stage growth companies who can solve problems that the financial institution is looking to address.

The promise of synthetic data

While Hazy is by no means the only synthetic data startup, it caught the attention of Wells Fargo for many reasons, including its targeted niche in financial services. Another factor in Hazy’s favor: the self-service model for generating and using datasets so Wells Fargo data scientists can easily request and use the kinds of synthetic datasets they need. “We wanted to make sure that it would be extremely easy for our data scientists to access data and train models,” said Madhu Narasimhan, head of innovation and strategy, digital & innovation COO with Wells Fargo. “Absent that, we would just be transferring manual labor from one place to the other; we don’t harness any real value,” Narasimhan said. Another advantage: “Synthetic data allows us to carry out our experiments at scale.”

For now, Hazy and Wells Fargo are focused on laying the groundwork for synthetic datasets. “Our first step is to get out of the business of curating, sourcing, labeling, and the time-consuming work of just prepping the data,” Narasimhan said. “We want to get into the more intelligent use cases of tooling it.”

The intelligent use cases Wells Fargo has in mind for the synthetic data it will harvest from Hazy, include fraud detection using machine learning models. Once synthetic datasets take a first pass at replicating the real-life equivalent, they can be fine-tuned to amplify different regions of the dataset. For example, you can have a synthetic dataset that has more fraudulent banking transactions than the real data. Building a ML model on the foundation of such datasets makes fraud easier to detect because the model has simply seen more instances of it. When applied back to real data, that model might perform better, Keen said.

Decreasing bias?

Synthetic data can also be used to correct one of the biggest challenges in working with legacy institutional information: bias. While the first pass will faithfully reproduce bias prevalent in the original data, subsequent sets can be retrained to include more women in job-hiring algorithms, for example. The challenge, though, is that simply increasing numbers might not do much to scrub related biases – like wage inequities – off the table.

As for Wells Fargo, in addition to the fraud detection use case, Narasimhan is looking forward to improving the customer engagement experience in the future. Narasimhan’s advice to businesses who are looking to complement their operations with synthetic data: “Realize the value of data and then figure out how you’re going to expend that data capital in your business.” Work backwards from your outcomes to figure out if and why you need synthetic data, she said.

Narasimhan has been struck by how the excitement related to Wells Fargo’s use of synthetic data. “We underestimated the amount of enthusiasm our internal data scientists would have for this,” she said. “It’s not just a plus for our business and our ability to serve our customers better; it’s a great workforce energizing mechanism as well.”

The promise of synthetic data

Decreasing bias?

More