Over 70 million records were stolen or leaked from poorly configured databases last year, making privacy a top concern. That’s no doubt one motivation behind Google’s open-sourcing this morning of Private Join and Compute, a new secure multi-party computation (MPC) tool designed to help organizations work together with confidential data sets.
“We continually invest in new research to advance innovations that preserve individual privacy while enabling valuable insights from data,” wrote engineering director Sarvar Patel and research scientist Moti Yung in a blog post. “Many important research, business, and social questions can be answered by combining data sets from independent parties, where each party holds their own information about a set of shared identifiers, some of which are common.”
At its core, Private Join and Compute lets organizations gain aggregated insights about the other party’s data. They’re able to encrypt identifiers and associated data, join them, and then perform calculations on the overlapping corpora to draw useful information. All identifiers and their associated data remain fully encrypted and unreadable throughout the process. While neither party is forced to reveal their raw data, they can answer questions at hand using outputs of the computation — for instance, counts, sums, and averages.
Private Join and Compute achieves this with two cryptographic privacy methods devised to protect sensitive data: Private set intersection and homomorphic encryption. The former lets two parties privately join their data sets and discover identifiers they have in common, while homomorphic encryption — an emerging approach that’s used in Intel’s HE-Transformer and other privacy-preserving utilities — allows certain types of computation to be performed directly on encrypted data without having to decrypt it first.
“This end result is the only thing that’s decrypted and shared in the form of aggregated statistics,” noted Patel and Yung. “This combination of techniques ensures that nothing but the size of the joined set and the statistics (e.g. sum) of its associated values is revealed. Individual items are strongly encrypted with random keys throughout and are not available in raw form to the other party or anyone else.”
Google expects that Private Join and Compute will find applications in “a wide array of fields” that require organizations to work together without revealing anything about individuals represented in the data, including (but not limited to) public policy, diversity and inclusion, health care, and car safety standards. “By sharing the technology more widely, we hope this expands the use cases for secure computing,” added Patel and Yung. “This is just the beginning of what’s possible.”
Private Join and Compute’s formal debut follows on the heels of TensorFlow Privacy, a library for Google’s TensorFlow machine learning framework that’s intended to make it easier to train AI models with strong privacy guarantees. Separately, it builds on broader efforts like Password Checkup, a Chrome extension that taps private set intersection (PSI), a cryptographic protocol, to match login credentials against an encrypted database of over 4 billion known unsafe credentials.