It’s no great secret that Apple’s voice assistant has plenty of room for improvement. The Cupertino company is aware of this — in June, it debuted an improved neural text-to-speech model capable of delivering a more natural-sounding voice without the use of samples. And in a newly published research paper on the preprint server Arxiv.org, a team of Apple scientists describe an approach for selecting training data for Siri’s domain classifier — the component that chooses whether a person’s command relates to, say, their calendar rather than their alarms — that leads to a substantial error reduction with only a small percentage of examples.
As the researchers explain, Siri processes speech to suss out the intended domain with a classifier called the Domain Chooser, which helps identify a given user’s intent. (For instance, it might pair “What is the weather today?” with the domain “Weather” and “What time does Starbucks close” with “LocalBusiness.”) Once an utterance is matched to one of the over 60 defined domains, a component called the Statistical Parser assigns a parse label to each part of the utterance, after which the domain and parse labels predicted by the Domain Chooser and Statistical Parser are mapped into an intent representation that kicks off the appropriate action.
The Domain Chooser is a multi-class system consisting of seven bidirectional long-short-term memory (bi-LSTM) networks, a type of AI model able to learn long-term dependencies. As such, “teaching” them to recognize utterances of new domains or classify known domains more accurately requires high-quality training data, which in turn requires slow and expensive human annotation.
The paper’s coauthors instead advocate what they describe as an active learning technique. Using an ensemble of classifiers, they identify incorrectly labeled data samples near the decision boundary — the region of a problem space in which the output label is ambiguous — learned by the existing classifier, so that including correctly labeled versions sharpens the boundaries learned on the new data.
Siri already allows for the discovery of prediction errors from bug reports, quality assurance team testing, and user actions that imply requests weren’t correctly interpreted (e.g., quickly pausing a song or quitting an app launched by Siri), but the researchers note that these errors are relatively few in number. To expand the set of hypothesized prediction errors, they propose finding examples in a pool of unlabeled data similar to the confirmed prediction errors and having human annotators assign the correct labels and add the results to the original training data. In practice, an error in which the utterance “Spell ‘volume'” was matched with the Settings domain would surface examples of utterances similarly misclassified as Settings, like “Increase volume” and “What is the volume.”
In the first of two experiments, the paper’s coauthors compiled a corpus consisting of 850,000 randomly selected utterances from a development set previously used to debug Siri, plus 20,000 utterances labeled training data obtained with their method. They used this corpus to train the Domain Chooser, which after 11 tests showed an 8.88% error rate reduction with only 2.3% of the original data swapped out for the new examples.
“In this paper, we have proposed a simple but effective method for efficient discovery of useful training data for a domain chooser classifier, as part of [Siri],” the researchers wrote. “The method produces … better quality data … [which] reduces the time taken for human annotation … Although developed and tested in the setting of a commercial intelligence assistant, the technique is widely applicable.”
After a lengthy period of stagnation, Apple has redoubled efforts to expand Siri’s capabilities while improving the assistant’s overall performance. A contractor speaking to the Guardian this week said the company is working on future updates that will enable Siri to “have a back-and-forth conversation about health problems” and offer integrated machine translation functionality. Separately, Apple is in the process of expanding Siri’s functionality in iOS 13, giving the Apple Watch more robust access to the digital assistant’s Shazam, Find My, and App Store-hunting features and enabling a HomePod smart speaker to begin playback based on a verbal command initiated on another Siri device.