Google researchers investigate how transfer learning works

Transfer learning's ability to store knowledge gained while solving a problem and apply it to a related problem has attracted considerable attention. But despite recent breakthroughs, no one fully understands what enables a successful transfer and which parts of algorithms are responsible for it.

That's why Google researchers sought to develop analysis techniques tailored to explainability challenges in transfer learning. In a new paper, they say their contributions help clear up a few of the mysteries around why machine learning models transfer successfully -- or fail to.

During the first of several experiments in the study, the researchers sourced images from a medical imaging data set of chest X-rays (CheXpert) and sketches, clip art, and paintings from the open source DomainNet corpus. The team partitioned each image into equal-sized blocks and shuffled the blocks randomly, disrupting the images' visual features, after which they compared agreements and disagreements between models trained from pretraining versus from scratch.

The researchers found the reuse of features -- the individual measurable properties of a phenomenon being observed -- is an important factor in successful transfers, but not the only one. Low-level statistics of the data that weren't disturbed by things like shuffling the pixels also play a role. Moreover, any two instances of models trained from pretrained weights made similar mistakes, suggesting these models capture features in common.

Working from this knowledge, the researchers attempted to pinpoint where feature reuse occurs within models. They observed that features become more specialized the denser the model (in terms of layers) and that feature-reuse is more prevalent in layers closer to the input. (Deep learning models contain mathematical functions arranged in layers that transmit signals from input data.) The researchers also found it's possible to fine-tune pretrained models on a target task earlier than originally assumed -- without sacrificing accuracy.

"Our observation of low-level data statistics improving training speed could lead to better network initialization methods," the researchers wrote. "Using these findings to improve transfer learning is of interest for future work."

A better understanding of transfer learning could yield substantial algorithmic performance gains. Google is using transfer learning in Google Translate so insights gleaned through training on high-resource languages -- including French, German, and Spanish (which have billions of parallel examples) -- can be applied to the translation of low-resource languages like Yoruba, Sindhi, and Hawaiian (which have only tens of thousands of examples). Another Google team has applied transfer learning techniques to enable robot control algorithms to learn how to manipulate objects faster with less data.