Privacy problems are widespread for Alexa and Google Assistant voice apps, according to researchers

Google Assistant and Amazon Alexa voice app privacy policies are often "problematic" and violate baseline requirements, according to a study coauthored by Clemson University School of Computing researchers. The work, which hasn't yet been peer-reviewed, analyzed tens of thousands of Alexa skills and Google Assistant actions to measure the effectiveness of their data practice disclosures. The researchers characterize the current state of affairs as "worrisome" and claim that Google and Amazon run afoul of their own developer rules.

Hundreds of millions of people around the world use Google Assistant and Alexa to order products, manage bank accounts, catch up on news, and control smart home devices. Voice apps (referred to as "skills" by Amazon and "actions" by Google) extend the platforms' capabilities, in some cases by tapping into third-party tools. But in spite of app store regulations and legislation that mandates data transparency, developers are inconsistent when it comes to disclosure, the coauthors of the Clemson study found.

To determine which Google Assistant and Alexa app developers' privacy policies were sufficiently "informative" and "meaningful," the coauthors scraped the content of skill and action web listings and conducted an analysis to capture practices provided in policies and descriptions. (Both Google and Amazon make available on the web the app storefronts for their voice platforms.) They developed a keyword-based approach, drawing on Amazon's skill permission list and developer services agreement to compile a dictionary of nouns related to data practices. Given phrases extracted from an app's policy and description, they used the verbs and nouns (e.g., "access," "collect," "gather," "address," "email") to spot relevant phrases, which they reviewed manually for accuracy.

Across a total of 64,720 unique Alexa skills and 2,201 Google Assistant actions (every skill and action scrapeable via the study's approach), the researchers sought to identify three types of problematic policies:

Those that don't outline data practices.
Those with incomplete policies (i.e., apps that mention data collection in their descriptions but whose policies don't elaborate).
Missing policies.

The researchers report that 46,768 (72%) of the Alexa skills and 234 (11%) of the Google Assistant actions don't include links to policies and that 1,755 skills and 80 actions have broken policy links. (Nearly 700 links lead to unrelated webpages with advertisements, and 17 lead to Google Docs documents that aren't publicly viewable.) The dichotomy is partially attributable to Amazon's lenient policy, which unlike Google's doesn't require developers to provide a policy if their skills don't collect personal information. But the researchers point out that skills which collect information often bypass the requirement by choosing not to declare it during Amazon's automated certification process.

A substantial portion of skills' and actions' policies share a privacy policy link (10,124 skills and 239 actions), with 3,205 skills sharing the top three duplicate links. Publishers with multiple voice apps are to blame, but this practice becomes problematic if one of the links breaks. The researchers found 217 skills using the same broken link as well as actions linking to a generic policy with company names and addresses but not action names, which Google requires.

Damningly, the researchers accuse Google and Amazon of violating their own requirements regarding app policies. One official weather Alexa skill asks for users' locations but doesn't provide a privacy policy, while 101 Google-developed actions lack links to privacy policies. Moreover, nine Google-developed actions point to two different general privacy policies, disregarding Google's policy requiring Google Assistant actions have app-specific policies.

When reached for comment, an Amazon spokesperson provided this statement via email to VentureBeat: "We require developers of skills that collect personal information to provide a privacy policy, which we display on the skill's detail page, and to collect and use that information in compliance with their privacy policy and applicable law. We are closely reviewing the paper, and we will continue to engage with the authors to understand more about their work. We appreciate the work of independent researchers who help bring potential issues to our attention."

A Google spokesperson denied that Google's actions don't abide by its policies and said third-party actions with broken policies have been removed as the company "continually" enhances its processes and technologies. "We've been in touch with a researcher from Clemson University and appreciate their commitment to protecting consumers. All actions ... are required to follow our developer policies, and we enforce against any action that violates these policies."

Privacy policy content and readability

In their survey of voice app privacy policy content, the researchers found the bulk didn't clearly define what data collection the apps were capable of. Only 3,233 Alexa skills and 1,038 Google Assistant actions explicitly mention skills or action names, respectively, and some privacy policies for kids' skills mention the skills could collect personal information. In point of fact, 137 skills in Alexa's kids category disclose that data collection could occur but provide only a general policy, running afoul of Amazon's Alexa privacy requirements for kids' skills.

More troubling still, the researchers identified 50 Alexa skills that don't inform users of what happens to information like email addresses, account passwords, names, birthdays, locations, phone numbers, health data, and gender or who the information is shared with. Other skills potentially violate regulations including the Children's Online Privacy Protection Act (COPPA), Health Insurance Portability and Accountability Act (HIPAA), and California Online Privacy Protection Act (CalOPPA) by collecting personal information without providing a policy.

Beyond the absence of policies, the researchers take issue with linked-to policies' lengths and formats. More than half (58%) of skills and actions policies are longer than 1,500 words, and none are available through Alexa or Google Assistant themselves; instead, they must be viewed from a store webpage or a smartphone companion app.

"Amazon Alexa and Google Assistant not explicitly requiring app-specific privacy policies results in developers providing the same document that explains data practices of all their services. This leads to uncertainties and confusion among end users ... Available documents do not give a proper understanding of the capabilities of the skill to end users," the coauthors wrote. "In some cases, even if the developer writes the privacy policy with proper intention and care, there can be some discrepancies between the policy and the actual code. Updates made to the skill might not be reflected in the privacy policy."

The researchers propose a solution in a built-in intent that takes the interaction model of a voice app and scans for data collection capabilities, creating a response notifying users the skill has these specific capabilities. The intent could be invoked when the app is first enabled, they say, so the brief privacy notice could be read aloud to users. This intent could also advise users to look at a detailed policy provided by the developers.

"This will give the user a better understanding of what the skill he/she just enabled is capable of collecting and using. The users can also ask to invoke this intent later to get a brief version of the privacy policy," the coauthors continued. "As our future work, we plan to extend this approach to help developers automatically generate privacy policies for their voice-apps."