Microsoft Research's Lin Xiao earns Test of Time award at NeurIPS

At NeurIPS this week in Vancouver, Canada more than 1,400 pieces of AI research are being examined for their novel approaches or breakthroughs -- but one of these papers is unlike all the rest.

Microsoft Research's Lin Xiao was named winner of the Test of Time award this week, a title granted to AI research that's made important and lasting contributions to the AI field over the last 10 years.

A specially made committee is convened to look back at papers published at NeurIPS 10 years ago and narrows the list down to 18 papers that have had a lasting influence on machine learning, measured in part by which papers garnered the most citations in the past decade. To date, Xiao's paper has been cited more than 600 times by other researchers.

NeurIPS organizers announced Xiao's work as the winner Sunday, and he detailed the results and progress made since then in a conference hall with 1,000 of the conference's 13,000 attendees.

"Ten years ago the conference was much smaller, but I felt it was just as exciting as a relatively young researcher," Xiao said onstage today. "Several of the very exciting topics at that time clashed together to create the motivation for this work."

The paper, titled "Dual Averaging Method for Regularized Stochastic Learning and Online Optimization," was published in 2009 and proposed a new online algorithm called Regularized Dual Averaging, or RDA.

RDA focuses on stochastic gradient descent, drawing on previous works Robbins and Monro published in 1951 on the subject and "Primal-dual subgradient methods for convex problems."

"I would like to acknowledge the influence and inspiration of Professor Yurii Nesterov on this paper, and pretty much everything in my research," Xiao said. "This work is a simple extension of his paper."

Last year's Test of Time award winner, work by Facebook AI Research's Leon Bottou and Google AI's Olivier Bousquet, also went to research focused on stochastic gradient descent for large-scale machine learning.

To optimize performance of the RDA model, Xiao's work combines regulations regularization, which encourages learning algorithms with online learning. Sparse regularization is used to set some weights in the model to zero, a way to make stochastic gradient descent easier to understand.

"I believe the motivations for RDA remain valid today, because on one side we know that a possibility online algorithms are on the main stage of machine learning because of the amount of data it processes. On the other hand, I believe sparsity is essential to getting us to larger and larger models. Someway or somehow, sparsity tends to be an effective part," Xiao said.

Earlier this week, NeurIPS conference organizers awarded top honors to new AI research as well, including Outstanding Paper for work on distributed learning and Outstanding New Direction honors for a paper that argues uniform convergence may not explain generalization in deep learning. More on research that earned top honors can be seen in this NeurIPS Medium post.