While the year has been dominated by stories of Siri’s problems, one of Siri’s successes — its ability to start listening when invoked with “Hey Siri” — is the subject of Apple’s latest Machine Learning Journal entry, which also hints at future improvements for the feature. The blog explains how iOS devices now continuously listen for the prompt, and points to AI advances that could eliminate Siri’s need for initial training.
In “Personalized Hey Siri,” the Siri team notes that some of the feature’s biggest challenges are accidental activations and accidentally missed activations. Currently, Apple attempts to prevent these problems by setting up Siri with the device owner’s voice, asking users to briefly train Siri with five utterances that create a device-stored user profile. Siri then quietly adds the user’s next 35 “accepted” utterances to improve the profile.
What’s interesting about the “Hey Siri” trigger is that the profile isn’t just trying to match a single voiceprint against subsequent repetitions of the same phrase. Instead, the profile eventually includes a baseline version of the phrase with 40 mathematically scored variations; Siri only wakes if a given “Hey Siri” utterance is equal to or higher than the average score of the pool. The device also stores recordings of those 40 “Hey Siri” requests, enabling the user’s profile to be rebuilt without user retraining any time Apple improves the system with a software update.
In the future, Apple expects that Siri won’t be trained up front — the user profile will start empty, then use the user’s permission to grow and update itself organically as additional requests come in. The company is also working on ways to screen out “false accepts” — Siri getting triggered by something other than the user. Using a deep neural network, Apple believes it could cut the false accept rate by 50 percent, the false reject rate (when the user says “Hey Siri” but the device doesn’t respond) by roughly 40 percent, and Siri activation by another person by nearly 75 percent.
The other trick is getting Siri to perform better in large, reverberant rooms and noisy environments such as cars or windy outdoor settings. Apple’s team says that it’s currently researching ways to deal with environments that are severe mismatches with the user profile’s existing Hey Siri recordings, but has found success in so-called “multi-style training, in which a subset of the training data is augmented with different types of noise and reverberation.”
Because Apple continuously changes Siri and doesn’t necessarily flag all of the tweaks, it’s unclear when the improvements spotlighted above will be implemented. That said, the company frequently highlights major changes to Siri at each June’s Worldwide Developers Conference, and given the amount of recent discussion of Siri’s shortcomings, we’d expect to hear some major announcements in the not-too-distant future.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here