Meet the new twist on data encryption that promises better privacy and security for AI

Presented by Intel

AI and privacy needn't be mutually exclusive. After a decade in the labs, homomorphic encryption (HE) is emerging as a top way to help protect data privacy in machine learning (ML) and cloud computing. It’s a timely breakthrough: Data from ML is doubling yearly. At the same time, concern about related data privacy and security is growing among industry, professionals and the public.

“It doesn’t have to be a zero-sum game,” says Casimir Wierzynski, senior director, office of the CTO, AI Products Group at Intel. HE allows AI computation on encrypted data, enabling data scientists and researchers to gain valuable insights without decrypting the underlying data or models. This is particularly important for sensitive medical, financial, and customer data.

Wierzynski leads Intel’s privacy preserving machine learning efforts, including HE work and in developing industry standards for the technology’s use. In this interview, he talks about why HE is needed, its place as a building block of improved privacy, and how the breakthrough technology helps create new business opportunities and data bridges to previously “untrusted partners.”

What and why

Q: What does “homomorphic” mean?

In Greek, homo is the same, morphic is the shape. It captures the idea that if you do encryption in the right way, you can transform ordinary numbers into encrypted numbers, then do the same computations you would do with regular numbers. Whatever you do in this encrypted domain has the same shape as in the regular domain. When you bring your results back, you decrypt back to ordinary numbers, and you get the answer you wanted.

Q: What big problem is the technology is solving?

We live in amazing times, when we have all these new AI products and services, like being able to unlock your phone with your face, or systems that enable radiologists to detect diseases at earlier stages. These products rely on machine learning systems, which are fed and shaped by data that are very sensitive and personal. It’s important for us as an industry -- and I would argue as a society -- to figure out how we can keep unlocking all the power of AI while still helping protect the privacy of the underlying data. That’s the overarching problem.

Q: Why is HE creating buzz now?

The technique itself has been around for more than 20 years as a theoretical construct. The criticism has been, okay, you can operate on encrypted data, but it takes you a million times longer than using regular data. It was an academic curiosity. But in the last five years, and especially the last two years, there have been huge advances in the performance of these techniques. We’re not talking about a factor of a million anymore. It’s more like a factor of 10 to 100.

Q: Certainly, there's no shortage of data privacy and protection technologies…

We’ve all gotten good about encrypting data at rest, when it’s on our hard drives, or sending data back and forth over encrypted channels. But when you’re dealing with ML and AI, at some point those data have to be operated on. You need to do some math on those data. And to do that you need to decrypt them. While you’re using the data, the data are decrypted, and that creates a potential issue. We work to provide better protection on both these fronts, anonymization and encryption.

Q: In ML with multiple partners, trust is a big issue. How does HE help here?

Whenever you’re dealing with digital assets, you have this phenomenon: When you share a digital asset with another party, it’s completely equivalent to giving it to them, then trusting they’re not going to do something unintended with it.

Now add the fact that ML is fundamentally an operation that involves multiple stakeholders. For example, one entity owns the training data. Another entity owns data they want to do some inference on. They want to use ML service provided by a third entity. Further, they to use an ML model owned by another party. And they want to run all this on infrastructure from some supply chain.

With all these different parties, and because of the nature of digital data, all must trust each other in a complex way. This is becoming harder and harder to manage.

HE in action

Q: Can you give an example of homomorphic encryption at work?

Say you have a hospital doing a scan on a patient, working with a remote radiology service. The hospital encrypts the scan and sends it over to the radiologist. The radiologist does all the processing in the encrypted domain, so they never see the underlying data. The answer comes back, which is also encrypted, and finally the hospital decrypts it to learn the diagnosis.

Q: In the example above, where does HE actually happen? At the scanner?

It could live in multiple places. There are two major components: encryption and decryption. But then there’s also the actual processing of the encrypted data. Encryption typically happens where the sensitive data are first captured, for example, in a camera or edge device. Processing encrypted data happens wherever the AI system needs to operate on sensitive data, typically in a data center. And finally, decryption happens only at the point where you need to reveal the results to a trusted party.

Powerful with other emerging techs

Q: When you speak and write on HE, you also talk about adjacent technologies. Please explain briefly the roles these other building blocks play in preserving privacy.

There are several very promising and rapidly developing technologies that use tricks from cryptography and statistics to do operations on data that seem magical.

All these technologies can be further accelerated and enhanced by hardware security measures called trusted execution environments, such as Intel SGX.

New bridges to partners in AI and ML

Q: So what opportunities are created when these technologies are combined?

One of the questions we’ve been asking at Intel is, what would happen if you could enable ML in these multi-stakeholder operations between parties that don’t necessarily trust each other, like we described earlier? What would that enable?

You could have banks that normally are rivals. But they may decide it’s in their interest to cooperate around certain risks that they all face in common, like fraud and money laundering. They could pool their data to jointly build anti-money laundering models while keeping their sensitive customer data private.

Another example is in the retail sector. Retailers want to make the most out of the data that they’ve collected on shoppers, so they can personalize certain experiences. What if there were a way to enable that and still provide quantifiable protections around the privacy of that data?

New business opportunities

Q: Are there new revenue models and opportunities being created?

One thing that’s exciting about this area is that you start to enable new business models around data that would have previously been impossible. For example, (HE) can be a way to monetize data while helping maintain security. At the same time, it addresses one of the biggest problems in ML, namely, access to large, diverse data sets for building models. So you can imagine a whole ecosystem that brings together people who hold data with people who need data. And very importantly, it's all done in a way that preserves security and privacy of the data. That’s an exciting possibility.

Q: How advanced is that concept in implementation? Any other or many real-life instances like that?

I would say it’s beyond theory, and in early stages of commercial deployment.

Data is becoming a much bigger asset class than ever before, sometimes in surprising ways. For example, in corporate bankruptcy cases, creditors can end up owning large data sets unrelated to banking in any way. They’re just leftovers of a loan that went sour. So they’re looking for a way to monetize these assets and provide them to data scientists who seek additional training data to make their AI systems more reliable, all while keeping the underlying data private and secure.

Or imagine a bunch of hospitals that have patient data. For various reasons, they can’t share it. But they know if they could, and could get lawyers in the room to hammer out an agreement, they would potentially be able to jointly build a model with much more statistical power than one any could build individually. Using privacy-preserving ML techniques, they can essentially form a consortium and say: "In exchange for all of us owning this improved model that will improve our patients' outcomes, we’ll be a part of this consortium, and we can still keep all of our patient data private and secure."

Key role of developers

Q: Where do developers fit in?

As it is now, if you’re an AI data scientist and you want to build a machine learning model that operates on encrypted data, you have to be some magical person, simultaneously an expert in software engineering and data science and post-quantum cryptography.

One of the major efforts that my team has been working on is making these technologies much more accessible and performant for the data science and developer communities so that they can scale up. This is a priority for Intel. Today we're offering tools like The Intel HE transformer for nGraph, which is a Homomorphic Encryption (HE) backend to Intel's graph compiler for Artificial Neural Networks.

For starters, homomorphic encryption is very compute-intensive. This is an area where Intel can really shine in terms of building optimized silicon to handle this fundamentally new way of computing.

But more broadly, those examples from health care, retail, and finance -- sectors representing about a quarter of GDP in the United States. These are very economically important problems. At Intel, we are obsessed with helping customers solve their problems around data. And privacy is at the heart of any data-centric business.

We are in a unique position, because Intel works closely with hyperscale, data-centric users of hardware who are building all kinds of exciting AI applications. At the same time, we are a neutral party with respect to data. To make these technologies perform at scale is going to require the kinds of complex software and hardware co-design that Intel is uniquely positioned to provide. We get to collaborate actively with a fascinating range of players across industry.

Q: Intel has taken a leadership role in developing HE standards. Why are they important? Status?

As with any crypto scheme, people will use it when there’s interoperability and trust in the underlying math and technology. We recognize that to really scale up, to bring all the homomorphic encryption goodness to the world, we need to have standards around it.

As interest in privacy preserving methods for machine learning grows, it’s essential for standards to be debated and agreed upon by the community, spanning business, government, and academia. So in August, Intel co-hosted an industry gathering of individuals from Microsoft, IBM, Google, and 15 other companies interested in this space.

We’ve identified many points of broad agreement. Now we’re trying to figure out the right approach to bring it to standards bodies, like ISO, IEEE, ITU, and others. They all have different structures and timelines. We’re strategizing on how best to move that forward.

Final word

Q: Any thoughts you’d like to leave ringing in people’s ears?

Privacy and security technologies for machine learning like homomorphic encryption are ready for prime time. These are not academic exercises anymore. These are real market-ready ideas. The day is coming when the idea of doing machine learning on someone else's raw data will seem quite strange. We’re at the start of a new and exciting era where machine learning will enable us to explore new opportunities unlike anything before.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.