Gboard's new handwriting recognition AI makes up to 40% fewer mistakes

Google has improving handwriting recognition in Gboard, its virtual keyboard for iOS and Android devices, with a faster AI system that makes between 20 to 40 percent fewer mistakes than the machine learning models it replaces. That's according to researchers at Google AI, who describe their work in a blog post published this afternoon.

"Progress in machine learning has enabled new model architectures and training methodologies, allowing us to revise our initial approach [and] instead build a single ... model that operates on the whole input," senior software engineers Sandro Feuz and Pedro Gonnet wrote. "We launched those new models for all Latin-script based languages in Gboard at the beginning of the year."

As Feuz and Gonnet explain, most handwriting recognizers use touch points to get a handle on sketched-out Latin characters. Drawn inputs are represented as a sequence of strokes, and these strokes in turn comprise sequences of time-stamped points. Gboard first normalizes the touch-point coordinates to ensure they remain consistent across devices with different sampling rates and accuracies, and then converts them into a sequence of cubic Bézier curves -- parametric curves commonly used in computer graphics.

The chief advantage of these sequences, Feuz and Gonnet say, is that they're more compact than the underlying sequence of input points. To that end, each curve is represented by a polynomial (an expression of variables and coefficients) defined by start points, endpoints, and control points. The word "go," for example, might contain 186 such points, represented by a sequence of four cubic Bézier curves for the letter "G" (and two control points) and three curves for the letter "O."

These sequences feed into a recurrent neural network trained to recognize the character being written -- specifically, a bidirectional version of quasi-recurrent neural networks (QRNN), a network capable of "efficient" parallelization and thus good predictive performance. Importantly, QRNNs also keep the number of weights -- the strength of the connections between the mathematical functions, or nodes, that make up the network -- relatively small, reducing file size.

So how does the AI model make sense of the curves? By producing a matrix of columns and rows, where each column corresponds to one input curve and each row corresponds to a letter in the alphabet. The outputs of the network are combined with a character-based language model that awards bonuses to character sequences common in a language and penalties to uncommon sequences. Separately, the sequence of touch points is converted to a shorter sequence corresponding to a single curve. Finally, the QRNN-based recognizer, given a sequence of curves, spits out a sequence of character probabilities.

Gboard's handwriting recognition stack runs on-device, a feat the team achieved by converting the recognition models (which were trained in Google's TensorFlow machine learning framework) to TensorFlow Lite models. This enabled them to not only lower inference times compared with a full TensorFlow implementation, but to reduce Gboard's storage footprint. "We will continue to push the envelope beyond improving the latin-script language recognizers," Feuz and Gonnet wrote. "The handwriting team is already hard at work launching new models for all our supported handwriting languages in Gboard."

More