Google's on-device text classification AI achieves 86.7% accuracy

Deep neural networks -- layers of mathematical functions that mimic the behavior of neurons in the human brain -- are at the heart of state-of-the-art machine translation and objection recognition systems. They're what help translate one language into another and extract addresses from business cards. The problem is, they're frequently hardware-constrained on smartphones, wearables, and other mobile devices -- particularly when it comes to memory and computation.

There's hope yet for performant offline algorithms, though. In a paper presented this week at the Conference on Empirical Methods in Natural Language Processing in Brussels, Belgium, Google researchers described offline, on-device AI systems -- Self-Governing Neural Networks (SGNNs) -- that achieve state-of-the-air results in specific dialog-related tasks.

"The main challenges with developing and deploying deep neural network models on-device are (1) the tiny memory footprint, (2) inference latency and (3) significantly low computational capacity compared to high-performance computing systems, such as CPUs, GPUs, and TPUs on the cloud," the team wrote. " [SGGNs] allow us to compute a projection for an incoming text very fast, on-the-fly, with a small memory footprint on the device, since we do not need to store the incoming text and word embeddings."

As the paper's authors explained, there are myriad ways to design a lightweight, on-device text classification model, like incorporating a model with graph learning, which is what's used in Google's Smart Reply and automatically generates short email responses. But most either don't scale well or result in large models.

By contrast, SGGN employs a modified version of the locality sensitive hashing (LSH), a technique that reduces the number of dimensions in data by hashing, or mapping, input items so that similar items map to the same "buckets" with high probability. As the name implies, it's self-governing -- it can learn a model without having to initialize, load, or store any feature by dynamically transforming inputs into low-dimensional representations with projection functions. Moreover, as it trains on data, it learns to choose and apply specific operations that are more predictive for a given task.

This reduces the input dimension from millions of unique words to short, fixed-length sequences of bits, the team wrote, and obviates the need to store text and word embeddings (vectors that represent words and phrases). In experiments, SGGNs used a fixed 1120-dimensional vector, regardless of the input data's vocabulary or feature size, as compared to word embedding methods with storage requirements exceeding hundreds of thousands of dimensions.

The researchers used two dialog act benchmark datasets to evaluate SGGN: Switchboard Dialog Act Corpus (SWDA), which contains two speakers and 42 dialogs acts, and ICSI Meeting Recorder Dialog Act Corpus, a dialog corpus of multiparty meetings.

SGGN outperformed both baseline AI systems by 12 percent to 35 percent without preprocessing, tagging, parsing, or pretraining embeddings. And with the SWDA and MRDA datasets, it achieved an accuracy of 83.1 percent and 86.7 percent accuracy -- higher than the benchmarked-against bleeding-edge convolutional neural networks and recurrent neural networks -- and 73 percent accuracy on Japanese, close to best-performing systems.

In future work, the researchers plan to investigate the use of SGGNs in other natural language tasks.

"Our study also shows that the proposed method is very effective for such natural language tasks compared to more complex neural network architectures, such as deep CNN and RNN variants," the researchers wrote. "We believe that the compression techniques, like locality sensitive projections jointly coupled with non-linear functions, are effective at capturing low-dimensional semantic text representations that are useful for text classification applications."

At-the-edge AI systems have advanced by leaps and bounds in recent years.

In September, Dublin startup Voysis announced WaveNet-based tech that can not only run offline but on smartphones and other devices with mobile processors. In August, researchers at Google developed offline AI with record facial recognition and object detection speed. And in May, Qualcomm claimed that its on-device voice recognition systems were 95 percent accurate.