Apple and Google halt human voice-data reviews over privacy backlash, but transparency is the real issue

Both Google and Apple are suspending some of their voice data-review practices, after separate reports in the past month revealed the extent to which the companies allow humans to listen to private conversations.

Following a data leak last month, Google confirmed that some of its contractors listen back to recordings of what people say to Google Assistant -- this, it said, helps it improve its support for more languages, accents, and dialects. While it's likely true that employees or contractors can't correlate recordings with user accounts, the contents of many of the recordings contained personally identifiable data, including addresses, names, and other private information. Moreover, many of the recordings had been accidentally activated by the user.

Later in the month, a separate report alleged that Apple often allowed workers to access up to 30 seconds of "accidental" Siri recordings as part of its voice grading program. While it was already known that Apple listened to some Siri recordings to improve its quality, the new report found that recordings were accessed not only by its internal staff, but by contractors with high turnover rates. And again, Siri could be accidentally triggered, such as by the sound of a zipper, or words that sound like "Siri," with 30-second long snippets of recordings made unbeknownst to the user.

Yesterday, news emerged that a German privacy authority had ordered Google to stop harvesting Google Assistant voice data in Europe for human reviewers. In reality, the authority only has the power to enforce the ban for three months because Ireland serves as the main jurisdiction for Google in Europe. The Hamburg Commissioner for Data Protection and Freedom of Information (HmbBfDI) said [in German]:

The Hamburg Commissioner for Data Protection and Freedom of Information has opened an administrative procedure to prohibit Google from carrying out such evaluations by employees or third parties for a period of three months. This should protect the privacy rights of those affected for the time being for the time being.

Google said at the time that it had already stopped processing such information in July following the original public backlash.

Earlier today, Apple confirmed that it had suspended its grading program globally, pending a "thorough review," according to a statement issued by the Cupertino company.

It's also worth noting the silent elephant in the room here -- Alexa. Amazon's digital assistant is by far the market leader in the U.S. from a smart speaker perspective, though of course Siri and Google Assistant's install base extend deeper into the technology realm through billions of smartphones and tablets. Nonetheless, a Bloomberg report from April confirmed that Amazon also opens up voice data captured from its users for the purposes of training and improving Alexa. So far, Amazon hasn't confirmed any plans to halt its voice review practices in response to any of these latest privacy reports.

Transparency

While there is undoubtedly growing concerns over how technology companies process user data, the bottom line is that for artificial intelligence to improve, it will need humans at the helm overseeing things and annotating data for some time. But that isn't necessarily the core concern at play here -- the underlying problem perhaps relates more to transparency, and whether people are sufficiently informed on how their private conversations could be accessed.

What's needed is a clearer permission structure. No lengthy privacy policies, or hidden opt-out settings -- a clear pop-up that asks the user if they're happy with third parties listening in on their domestic activities. Zero obfuscations. As it happens, Apple confirmed today that it will allow users to opt out of its voice data grading program through a future software update, though it remains to be seen how clear this opt-out will be.

Europe is playing a prominent role in the push to hold companies accountable for the user data they leverage. In the past month, British Airways (BA) was slapped with a record £183.39 million ($230 million) fine over a 2018 security lapse, which was followed soon after by hotel giant Marriott, which was hit with a £99 million ($123 million) fine for similar breaches. This was all made possible to European GDPR regulations, which took effect last May.

Transparency is a key facet of GDPR regulations, too, and back in January Google was fined €50 million ($57 million) by French data privacy body CNIL (National Data Protection Commission) for what it called a "lack of transparency, inadequate information and lack of valid consent" regarding its methods for personalizing advertisements.

"The use of language assistance systems in the EU must comply with the data protection requirements of the GDPR," noted Johannes Caspar, the Hamburg Commissioner for Data Protection and Freedom of Information, in a statement yesterday. "In the case of the Google Assistant, there are currently significant doubts. The use of language assistance systems must be done in a transparent way, so that an informed consent of the users is possible. In particular, this involves providing sufficient information and transparently informing those concerned about the processing of voice commands, but also about the frequency and risks of mal-activation."