Amazon is poorly vetting Alexa's user-submitted answers

Alexa, Google Assistant, Siri, and Cortana can answer all sorts of questions that pop into users' heads, and they're improving every day. But what happens when a company like Amazon decides to crowdsource answers to fill gaps in its platform's knowledge? The result can range from amusing and perplexing to concerning.

Alexa Answers allows any Amazon customer to submit responses to unanswered questions. When the web service launched in general availability a few weeks ago, Amazon gave assurances that submissions would be policed through a combination of automatic and manual review. But an analysis of public Alexa Answers data conducted by VentureBeat shows that untrue, potentially sponsored, and offensive questions and answers are accepted and served to the over 100 million Alexa-enabled devices sold to date.

Bad info

Alexa Answers employs a points-based system to ensure quality answers float to the top -- at least in theory. When a submitted answer goes "live" -- that is, when Amazon begins serving it to Echo and third-party Alexa device owners -- positive feedback from those users (as well as the number of times it's shared) increment its overall score. (After responding to a question with an Alexa Answers submission, Alexa asks "Did that answer your question?") Conversely, negative feedback decrements the score. Along with an average star rating (out of five) assigned by members of the Alexa Answers community, the feedback score determines whether an answer is served to Alexa users. Those below a certain score threshold aren't shared, while higher-rated answers to questions with multiple submissions are shared more often.

Alexa Answers lets Amazon account holders browse and answer questions asked by Alexa users across topics like animals, climate, film and TV, food, geography, history, literature, music, science, sports, and video games. They're also able to sort questions by recency, popularity among Alexa Answers or Alexa users, or status (i.e., whether they've been answered).

Questions sourced from Alexa Answers are appended with an "According to an Amazon customer" disclaimer. That's true no matter which Alexa-enabled device answers the question, whether an Echo smart speaker or display; a Fire Tablet; an Android tablet preloaded with Alexa Smart Screen; or the Alexa app for Android, iOS, and Windows.

Consider the question: "Why are cows bad for the environment?" There are a few things happening here, so let's unpack them individually. During our testing, the Alexa app for Android and iOS in Canada initially relayed Alexa Answers content without the disclaimer. It was fixed roughly a week after we brought the bug to Amazon's attention, but we'll have to take the company at its word that the disclaimer works in every supported region and language. Your mileage may vary.

Next, notice that Alexa is providing two different answers. That's because this particular question has two submissions on Alexa Answers. Amazon tells us that for questions with two or more contributor-submitted answers, Alexa might rotate among them until a clear winner emerges. Sometimes getting a different answer is as simple as asking Alexa again -- or on mobile, restarting the app and then asking again.

To be fair, "Why are cows bad for the environment?" is a leading question, but it underlines a serious flaw of Alexa Answers. Slightly rephrased questions can yield different answers, or occasionally no answer at all. For instance, if you ask Alexa "Are cows bad for the environment?," it will answer along the lines of "Sorry, I don't know that one."

Questionable questions

The questions in Alexa Answers come from Alexa customers who ask questions to which the assistant doesn't have an answer. Once a question has been asked a certain number of times -- Amazon declined to say how many -- it makes its way onto the Alexa Answers portal, where it's fair game for anyone with an Amazon account. As a result, different answers to nearly identical questions emerge often.

Alexa succinctly responds to the prompt "What wine goes with chili?" with "Red wine." But if you ask "What wine goes well with chili?," Alexa gives a more detailed answer: "Most sommeliers agree that light reds like Pinot Noir and Beaujolais served lightly chilled go very well with chili. The fruitiness and body play nicely with the layers of spices found in most chili recipes."

It's easy to see how this can be abused. For instance, the question "What plants are bad for cats?" has two answers of varying helpfulness: "poison ivy and my moms cooking" and "Poinsettias." The former is likely meant in jest, but it stands in contrast to the detailed list Google Assistant provides from PetMD.com.

Questions in Alexa Answers are transcribed using Alexa's imperfect text-to-speech engine, which leaves answerers in the position of making best guesses as to the questioner's meaning. For example, one user assumed that "siki sauce" in the question "What is sat siki sauce?" was intended to be "tzatziki sauce," and that "mick" in "How much is mick romney worth?" was a butchering of "Mitt" (referring to former Massachusetts governor Mitt Romney). But they don't always get it right. One user responded to the question "How do dolphins breed?" with "Dolphins are mammals and breathe with the lungs," presumably assuming "breed" was meant to be "breathe."

Questionable answers

Amazon says that questions submitted to Alexa Answers might be automatically rejected by a combination of automated and manual filters if they fall into any of these categories:

Inappropriate (subjective, advice, vulgar, profane, insulting, or offensive)
Incomprehensible
Incorrect or irrelevant
Threatening
Defamatory
Invasive of privacy
Infringing of intellectual property rights (including publicity rights)

Alexa Answers also allows members to flag answers they believe are in violation of the the terms of service. Flagged answers aren't visible on the Alexa Answers site, and they aren't shared with Alexa customers, but they can be rewritten and resubmitted to address the reason for flagging.

Problematically, the Alexa devices we tested responded to all of these questions at the time of publishing. It's the responsibility of users to retroactively flag problematic answers, which exposes Alexa users to them in the meantime.

Controversial and offensive answers

Here more questions that are troublesome from the start, some with equally troublesome answers:

How big do money trees get?
Is climate change a hoax?
How do you breed villagers? (this is likely a Minecraft reference)

Inaccurate answers

It's not unusual to come across factual inaccuracies on Alexa Answers.

Consider the question "What is the hottest flame color?" Reference.com reports that violet and white are the hottest on the color spectrum and visible spectrum, respectively, but an Alexa Answers contributor wrote "orange."

The human body has 22 pressure points, but you wouldn't know it from Alexa's answer to the question "How many pressure points does a human body have?" Inexplicably, the only submitted answer in Alexa Answers is "420." And dogs aren't jerks -- at least not intentionally -- but that's the explanation one user gave for the question "Why do dogs chew things up?"

Are AA batteries safe to eat? One would assume not, but that's contrary to what one Alexa user heard when he asked his Echo device about. The answer he received -- which assumed he was referring to a product within Amazon's low-cost brand family, Amazon Basics -- was: "Yes, but don't eat too many."

Another Alexa user asked "Are eggshells good for the soil?" An Alexa Answers user responded "no don't use the eggshells use the yolk the plants will be happier," which isn't entirely accurate. Yolks contain animal proteins that must be broken down before plants can use them, meaning they need to rot. And in truth, eggshells tilled into soil provide plants with a source of calcium.

Some questions submitted to Alexa Answers have conflicting answers. One contributor on the question "Has India landed on the moon?" noted that India located but wasn't able to establish contact with its latest moon lander, Chandrayaan-2. Meanwhile, a second user made a joke. The fact is that while the lander deviated from its intended trajectory and lost communication, suggesting a crash, it technically landed on the moon (albeit probably not intact).

Answers to the question "Who discovered San Francisco bay?" are similarly indecisive, with one asserting the Ohlone Indian tribe should be credited with the discovery rather than Spanish explorer Gaspar de Portolà. Ideally, Alexa would provide both answers instead of one at random. (Google Assistant sidesteps this awkwardness by noting in its answer, which it draws from Wikipedia, that Gaspar de Portolà is the first known European discoverer of San Francisco Bay.)

Questions about up-to-the-minute statistics predictably become outdated quickly. Take this one, for example: "How many subscribers does [YouTuber Alex] wassabi have?" Three months ago, the answer was 9.5 million subscribers. That number has since grown to 11.5 million.

Asinine answers

Alexa Answers users have a sense of humor.

Ridiculous questions warrant ridiculous answers in the eyes of some Alexa Answers users. Case in point: "How do you catch an elephant?" yields "In order to catch an elephant, you need cakes, raisins, a telescope, and a pair of tweezers" (an excerpt from Amy Schwartz's children's novel "How to Catch an Elephant"). Alternatively, it yields the decidedly dirtier reply: "First you dig a hole, fill it with ashes, and cover it with peas and when the elephant comes to take a pea, you kick it in the ash hole."

Silly answers abound to open-ended questions like "Name some aquatic animals?" The only response is "Whale, shark, whale shark, Blue Whale, dolphin, orca, crocodile, turtle, alligator, sponge, eel, great white shark, baby shark, momma shark, daddy shark." Asking "what are some aquatic animals?" surfaces content from Reference.com: "Some aquatic animals are sea turtles, jellyfish, clownfish, and blue whales."

Protected process

We've asked Amazon for more information about how Alexa Answers works, but the company has so far been cagey about the details. It's unclear why some questions designated "live" on the Alexa Answers dashboard are served to Alexa users while others aren't. In our testing, most questions and answers worked when we tried them, regardless of their status. Some questions and answers we submitted to Amazon during our investigation have since been removed, and we expect that at least some of the examples above will be, too.

"High quality answers are important to us, and this is something we take seriously -- we will continue to evolve Alexa Answers," an Amazon spokesperson told VentureBeat when contacted for comment.

Alexa Answers suffers from the shortcomings of the question-and-answer platforms that came before it, perhaps most famously Yahoo Answers, WikiAnswers, and StackExchange. It's incumbent upon users to answer questions thoroughly and in good faith, and to self-police beyond the behind-the-scenes automated filtering. The majority of Amazon customers participating in Alexa Answers stick to the rules, but some flout them. And unfortunately, it's unclear if malicious actors face punishment other than having their flagged answers removed.

That's bad news for Alexa users who are served answers from Alexa Answers. They simply aren't provided the reputational information about the answers' contributors. By lumping together contributors under the label "Amazon customers," Amazon runs the risk of elevating or conferring authority to people with poor track records for Alexa Answers -- a dangerous notion for Alexa device owners with kids who consider Alexa a reliable source of knowledge.

There's also a unique and problematic structural difference between the way users interface with information on answer websites versus virtual assistants. If you're looking for the best Yahoo Answer, ideally the site's system has chosen a best answer for you based on upvotes and other factors. But either way, you can see the math, as it were -- all the other answers, good and bad, serious and joking. If there's not a clear best answer, you can scan what's there and use your own judgment to suss out how helpful it is. And if an answer is a clear joke, prank, or spam, it's usually pretty easy to spot it and immediately scroll on. But with a voice assistant, you get none of that opportunity; you're getting one and only one answer in response to a question, and part of the convenience of using such as assistant is that, presumably, its back end does all of the work for you.

There's no silver bullet, but Amazon would do well to more closely scrutinize the answers submitted to Alexa Answers, perhaps with enhanced automated screening and human moderation. It might consider allowing people to prevent unrated submissions from Alexa Answers from reaching a child's Alexa-enabled device -- or their own devices.