Google Cloud chief AI scientist Andrew Moore today re-emphasized the company’s commitment to privacy and pledged not to review data enterprise customers share when training AI models or store in the cloud “without a legitimate need to support your use of the service — and even then it is only with your permission.” As part of that effort, Google Cloud is introducing a tool for customers to remove personally identifiable information like addresses, account numbers, or information about family members from text or medical imagery.
Removal of such data goes some way toward compliance with major privacy laws like the European Union’s GDPR and the California Consumer Privacy Act (CCPA). Automatic removal of personally identifiable information can also enable businesses to, for example, use conversations between a customer service agent and customer to train a chatbot.
“It’s been one of the listed causes of concern when you talk to your large corporations as to what would prevent them from starting to use AI in the cloud,” Moore told VentureBeat in a phone interview. “It’s a mechanism which makes sure that it’s both contractually and technologically clear that we cannot be mixing this data with other people’s data, nor can we be using any of the machine learning models that we’ve created to sort of incorporate into other Google products or services. So as you can imagine, for Google, it’s really, really important, because the whole world is watching us, that we’re extremely clear about what happens with that data.”
Moore said the redaction tool has been under development for the past nine months and that the timing of its release is in no way related to an antitrust report released last week. Following a 16-month investigation by a U.S. House of Representatives subcommittee, the report declared Amazon, Apple, Facebook, and Google monopolies and suggested that part of the solution may be to break the companies up.
The 449-page report found that internal communications demonstrate Google covertly monitors potential and actual competitors and “tracks real-time data across markets, which — given Google’s scale — provide it with near-perfect market intelligence.” In a separate development, the U.S. Department of Justice is expected to bring a case against Google in the coming days. Regulators in Australia, China, the European Union, and the U.K. are also investigating Google to determine whether the company has violated anticompetitive standards in their respective markets, cases that could result in corrective measures.
Concerns over user privacy and potential prying by Google or other major cloud providers — like AWS or Microsoft’s Azure — have made some enterprise customers spurn cloud services in favor of on-premise datacenters.
Though Google uses federated learning and differential privacy to, for example, provide Android smartphone users with personalized keyboards, it does not currently offer privacy-enable services.
Google Cloud teams working with customers to develop AI solutions will now work exclusively with data that removes personally identifiable information. Customers using the Google Cloud redaction tool can use it on their own systems before sending data to the cloud or via an encrypted cloud partition.
“As an academic, it really concerned me that the field of machine learning was going to stop because of privacy regulations. But now through developing this suite of redaction technologies, we’ve shown both to ourselves and our customers that we’re able to operate useful AI without ever requiring us to set eyes on any personally identifiable data,” Moore said.
Moore came to Google Cloud in 2018 from Carnegie Mellon University following the departure of Stanford professor Dr. Fei-Fei Li.
In related news, earlier this year Google Cloud launched its confidential computing portfolio, starting with Confidential VM (virtual machines).
How startups are scaling communication: The pandemic is making startups take a close look at ramping up their communication solutions. Learn how