Keyloggers aren’t the only way malicious hackers can get at your phone or tablet’s passcode. In a preprint paper (“A new acoustic side channel on smartphones“) published on this week, researchers describe a novel attack that recovers characters typed on a virtual keyboard from sounds generated by finger taps.

“We found the device’s microphone(s) can recover this wave and ‘hear’ the finger’s touch, and the wave’s distortions are characteristic of the tap’s location on the screen,” the paper’s coauthors wrote. “Hence, by recording audio through the built-in microphone(s), a malicious app can infer text as the user enters it on their device.”

Acoustic attacks targeting keyboards aren’t new, the researchers note — previous studies have investigated the use of mics to identify physical keys by their unique physical characteristics or defects. But soft keyboards naturally make for more difficult targets, because each tap happens on the same surface.

The team’s approach employs an app that recovers the sounds of taps and correlates them with keystrokes, using a machine learning algorithm that’s trained offline and tuned to a particular smartphone or tablet model. Architecting the algorithm required overcoming a significant engineering challenge: It needed to be able to account for the interfering vibrations produced by tapping fingertips. In the end, the researchers cross-correlated the feedback sound to disambiguate it from the vibration feedback, and subtracted out the vibration data.

Smartphone AI mic attack

With a model in hand, they set about calculating the time difference between the reception of the sound signals on the dual-mic devices they tested: LG’s Nexus 5 and Samsung’s Nexus 9. Roughly 70 percent of the recorded taps — which were in the frequency ranges 1,300-1,700Hz, 8000-8500Hz, 4000-4400 Hz, and 60-70 Hz — were fed into a machine learning classifier, while the remaining 30 percent were reserved for testing.

To validate their approach, the researchers developed an Android app that had users enter letters, words, and digits into fields while it collected audio through the on-device microphones. About 45 test subjects used it in environments with a fair amount of ambient noise, including a common room, a reading room, and a library.

Ten participants were asked to press each of nine digits (1 to 9) ten times in a random ordered, and 10 others were told to type 200 unique four-digit PINs. A third group was instructed to type letters (also randomly ordered), and a fourth was told to type five-character words from an open source data set.

The researchers report that, with two microphones, the model correctly predicted single digits three times better than a random guess in the worst case and 100 percent of digits in the best case. Moreover, it recovered 54 percent of PINs after 10 attempts and 91 out of 150 four-digit PINs in 20 attempts. Where letters and words were concerned, it outperformed a random guess by a factor of three with a single microphone. More alarmingly, it managed to recover seven words on the Nexus 5 and 19 on the Nexus 9 in 27 passwords within 10 attempts.

That’s despite the fact that the mic configurations aren’t identical — the Nexus 5’s primary mic is located on the bottom, opposite the second one on the top, while the Nexus 9’s second microphone is on the right side.

“This illustrates the hazards of reasoning about smartphone sandboxing given the complexity of modern platforms, as well as the need for a more realistic threat model for modern hardware,” the paper’s authors wrote.

They list a number of ways the attack might be mitigated — for instance, with physical switches that allow users to switch off the microphones, mics that have lower sampling frequencies, and additional glass layers on top of screens that could absorb most finger tap noise — but concede that the most obvious solutions have design and usability drawbacks. They instead posit (1) a mechanism that reports which sensors are active, and (2) “a secure attention sequence” for passwords or other sensitive text entry that blocks all sensors temporarily.

“Mobile devices may need a richer capability model, a more user-friendly notification system for sensor usage and a more thorough evaluation of the information leaked by the underlying hardware,” they concluded. “Until these (or other mitigations) are implemented in the platform, app developers should consider the use of tactical jamming if PIN theft via side channels is ever deployed at scale.”