In 2019, the number of published papers related to AI and machine learning was nearly 25,000 in the U.S. alone, up from roughly 10,000 in 2015. And NeurIPS 2019, one of the world’s largest machine learning and computational neuroscience conferences, featured close to 2,000 accepted papers from thousands of attendees.

There’s no question that the momentum reflects an uptick in publicity and funding — and correspondingly, competition — within the AI research community. But some academics suggest the relentless push for progress might be causing more harm than good.

In a recent tweet, Zachary Lipton, an assistant professor at Carnegie Mellon University, jointly appointed in the Tepper School of Business and the machine learning department, proposed a one-year moratorium on papers for the entire community, which he said might encourage “thinking” without “sprinting/hustling/spamming” toward deadlines.

“The paper avalanche is actually hurting people who don’t have [high citation counts and nice academic positions],” he said. “The noise level of the field is pushing things to a point where serious people no longer take ‘having papers’ as meaningful at all … [The] mere fact of having papers has become a useless signal because the noise level is so high, even among accepted papers.”

Timnit Gebru, the technical colead of the ethical artificial intelligence team at Google, echoed that sentiment in a tweet ahead of the AAAI Conference on Artificial Intelligence in New York City earlier this month. “I am involved in too many conference- and service-related things right now — I can’t even keep up with everything. Besides reviewing and area chairing, there’s logistics … organizing, etc.,” she said. “People in academia say that you have more time to do research in the industry, but that has not been the case for me at all … Reading, coding, and trying to understand feels like an activity I do in my spare time rather than my main responsibility.”

There’s preliminary evidence to suggest the crunch has resulted in research that could mislead the public and stymie future work. In a 2018 meta analysis undertaken by Lipton and Jacob Steinhardt, who is a member of the statistics faculty at the University of California, Berkeley and the Berkeley Artificial Intelligence Lab, the two assert that troubling trends have emerged in machine learning scholarship, including:

  • A failure to distinguish between explanation and speculation and to identify the sources of empirical gains
  • The use of mathematics that obfuscates or impresses rather than clarifies
  • The misuse of language, for example by overloading established technical terms

They attribute this in part to the rapid expansion of the community and the consequent thinness of the reviewer pool. The “often-misaligned” incentives between scholarship and the short-term measures of success — like inclusion at a leading academic conference — is also likely to blame, they say.
“In other fields, an unchecked decline in scholarship has led to crisis,” wrote Lipton and Steinhardt. “Greater rigor in exposition, science, and theory are essential for both scientific progress and fostering a productive discourse with the broader public. Moreover, as practitioners apply [machine learning] in critical domains such as health, law, and autonomous driving, a calibrated awareness of the abilities and limits of [machine learning] systems will help us to deploy [machine learning] responsibly.”

Indeed, a preprint paper by Google AI researchers demonstrated a system that could outperform human experts at finding cancers on mammograms. But as a recent Wired editorial pointed out, mammogram screenings are considered by some to be a flawed medical intervention. AI systems like the one Google promised could improve outcomes but at the same time worsen problems around overtesting, overdiagnosis, and overtreatment.

In a separate instance, Microsoft Research Asia and Beihang University researchers developed an AI model that could read and comment on news articles in a humanlike way, but the paper describing the model made no mention of its possible misuse. This failure to address ethical ramifications sparked a backlash that prompted the research team to upload an updated paper addressing the concerns.

“As the impact of machine learning widens, and the audience for research papers increasingly includes students, journalists, and policy-makers, these considerations apply to this wider audience as well,” wrote Lipton and Steinhardt. “By communicating more precise information with greater clarity, better [machine learning] scholarship could accelerate the pace of research, reduce the on-boarding time for new researchers, and play a more constructive role in public discourse.”

In their coauthored report, Lipton and Steinhardt outline several suggestions that might help correct the current trend. They say researchers and publishers should set better incentives by asking questions like “Might I have accepted this paper if the authors had done a worse job?” and by emphasizing meta-surveys that strip out exaggerated claims. On the authorship side, they recommend honing in on the “how” and “why” of an approach, as opposed to its performance, and conducting error analysis, ablation studies, and robustness checks in the course of research.

For AI coverage, send news tips to Khari Johnson and Kyle Wiggers and AI editor Seth Colaner — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI Channel.

Thanks for reading,

Kyle Wiggers

AI Staff Writer