Intel previews AI advances in software testing, sequence models, and explainability

This week marks the kickoff of Neural Information Processing Systems (NeurIPS), one of the largest AI and machine learning conferences globally. NeurIPS 2017 and NeuIPS 2018 received 3,240 and 4,854 research paper submissions, respectively, and this year's event -- which takes place from December 8 to December 14 in Vancouver -- is on track to handily break those records. (Submissions this year overwhelmed NeurIPS' website, which crashed minutes before the official deadline.)

Researchers from Intel will be in attendance, as will those from tech giants like Google, Facebook, Apple, Uber, Alibaba, Baidu, and countless others. For its part, the Santa Clara, California-based chipmaker said it intends to host three dozen conference, workshop, and spotlight sessions covering topics like deep equilibrium models, imitation learning, machine programming, and more.

"Intel is making significant strides in advancing and scaling neural network technologies to handle increasingly complex and dynamic workloads -- from tackling challenges with memory to researching new adaptive learning techniques," wrote Dr. Rich Uhlig, senior fellow and managing director of Intel Labs, in a blog post. "The developments we're showcasing at NeurIPS will help reduce memory footprints, better measure how neural networks process information, and reshape how machines learn in real time, opening up the potential for new deep learning applications that can change everything from manufacturing to health care."

Automating software testing

One area of research Intel plans to highlight is software development automated with machine learning. In a recent paper titled "A Zero-Positive Learning Approach for Diagnosing Software Performance Regression," a team from Intel Labs, MIT, and Texas A&M University proposed AutoPerf, an approach to automate regression testing (errors introduced by new code check-ins) in high-performance compute code. Leveraging only nominal training data and hardware performance counters (HWPCs), or sets of special-purpose registers built into processors that store metrics from a wide range of hardware-related activities, the paper's authors demonstrated that AutoPerf can detect some of the most complex performance bugs found in parallel programming.

AutoPerf's secret sauce is zero-positive learning (ZPL), a semi-supervised machine learning technique initially developed for anomaly detection. Performance defects are treated as irregularities that deviate from the expected behavior of a given piece of software, which the system uses to train autoencoders (i.e., types of machine learning models that learn representations from sets of data). As for the HWPCs, which track things like processor cycles elapsed, cache hits and misses, branch predictions, and instructions executed, they serve as a mechanism to collect information on software activity without modifying source code or introducing performance-impacting overhead.

The researchers report that in experiments involving three types of regressions across 10 bugs in seven benchmarks and programs, AutoPerf hasn't flagged a false negative. On average, it exhibits 4% profiling overhead and accurately diagnoses more performance issues than prior state-of-the-art approaches, including issues missed by expert programmers.

Goal-Conditioned Imitation Learning

Might robot algorithms learn quickly and easily from human demonstrations? Researchers at Intel, alongside contributors from the University of California, Berkeley and startup Covariant.ai, assert that they can in a paper titled "Goal-Conditioned Imitation Learning." The team proposes a novel model -- goalGAIL -- that's able to learn better than an expert demonstrator and that can even perform in situations with non-expert actions. They believe it could broaden robotic applications across industrial settings where algorithms might need to adapt quickly to new parts, as well as in personalized robotics, where models must adapt through demonstration to personal preference.

As the coauthors of the paper explain, a versatile form of self-supervision for robotics involves learning how to reach any previously observed state on demand. An AI model can be trained to seek to have its observation exactly match a predefined goal. However, the reward isn't often observed in practice, because it's extremely rare to encounter the exact sensory input twice. A technique called Hindsight Experience Replay (HER) solves this by relabeling a collected trajectory, replacing its goal with a state actually visited during that trajectory. But it's not perfect -- in cases where a specific state needs to be reached before a whole new area of the space is discovered, learning takes an inordinate amount of time. (Think navigating narrow corridors between rooms, or picking and placing an object with a mechanized arm.) That's where goalGAIL comes in. By leveraging imitation learning, it's able to obtain a demonstration from an expert that crafts a reward.

The researchers report that goalGAIL converges faster than HER and achieves a better final performance.

New approach to sequence models

A team of Intel researchers will discuss a new approach to machine learning on sequence data, or data where the order matters. In a typical model, neurons -- mathematical functions -- are arranged in interconnected layers that transmit "signals" from input data and slowly adjust the synaptic strength (weights) of each connection. That's how they extract features and learn to make predictions. The authors of "Deep Equilibrium Models," who hail from Carnegie Mellon and Intel, say they've managed to replace popular multi-layered architectures with a single-layer architecture that can reach state-of-the-art performance on language benchmarks while reducing the memory footprint by 88%.

This novel construction, which the team calls a deep equilibrium model (DEQ), directly finds the points toward which multi-layer networks converge via root-finding, an algorithm for finding zeroes of continuous functions. On the open source WikiText corpus, which contains over 100 million words, it's on par in terms of performance with Google's Transformer-based architecture, a multi-layer model that selectively adjusts the weightings between its elements. That's an impressive feat -- Transformer models have massive memory requirements (in this case up to 9GB versus the DEQ's maximum of 3.7GB) and can only be trained quickly on specialized hardware.

4-bit training without retraining

A convolutional network -- a type of AI model most commonly applied to analyzing visual imagery -- requires substantial computing resources, memory bandwidth, and storage capacity to ingest and process data. One method of speeding the analysis is quantiziation, or the process of approximating a model with a "compressed" version of that model. But recovering the accuracy lost as a result often requires full data sets and time-consuming fine-tuning.

That's why an Intel team partnered with a researcher from Technion in Israel to develop a quantization approach that doesn't involve either fine-tuning or the use of a large corpus. Instead, their 4-bit post-training quantization targets models' weights and their activation functions (the bit of a function that defines its output given an input or set of inputs). And it suggests three complementary methods -- analytical clipping for integer quantization, per-channel bit allocation, and bias correction -- for minimizing quantization error. This enables it to achieve accuracy that's just a few percents less than the state-of-the-art baseline across a range of convolutional models, the team reports in a coauthored paper titled "Post Training 4-Bit Quantization of Convolutional Networks for Rapid-Deployment."

"[Our work] introduces the first practical 4-bit post training quantization approach," wrote the researchers. "Neural network quantization has significant benefits in reducing the amount of intermediate results ... Our main findings in this paper suggest that with just a few percent accuracy degradation, retraining [convolutional] models may be unnecessary for 4-bit quantization."

Understanding neural networks

Explainability with respect to AI and machine learning is an area of intense study at Intel, and researchers there believe they (in partnership with scientists from MIT) have developed measures to better understand at least one category of model: speech models. In a paper accepted at NeurIPS 2019 ("Untangling in Invariant Speech Recognition"), they explain how they employed a recently developed statistical mechanical theory that connects the properties of AI model representations and the separability of classes to probe how data is untangled within the models.

According to the team, this statistical approach enabled them to observe that speaker-specific nuisance variations are discarded by the model's hierarchy, whereas task-relevant properties such as words and phonemes (units of sound that distinguish one word from another in a particular language) are untangled in later layers. Higher-level concepts such as parts-of-speech and context dependence also emerge in the later layers of the network, while deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation.

"Both a [convolutional] architecture and [an] end-to-end [automatic speech recognition] model converge on remarkably similar behavior, despite being trained for different tasks and built with different computational blocks. They both learn to discard nuisance acoustic variations and exhibit untangling for task relevant information," wrote the researchers. "Taken together, these findings shed light on how deep auditory models process time-dependent input signals to achieve invariant speech recognition, and show how different concepts emerge through the layers of the [model]."