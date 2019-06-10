Reproducibility puts the science in the computer science of AI. It’s how researchers can prove their AI systems are robust and reliable. To support reproducibility for AI models, Facebook today released PyTorch Hub in beta, an API and workflow for research reproducibility and support.

PyTorch Hub can quickly publish pretrained models to a GitHub repository by adding a hubconf.py file and publish models using a GitHub pull request. PyTorch Hub comes with support for models in Google Colab and PapersWithCode.

“Our goal is to curate high-quality, easily-reproducible, maximally-beneficial models for research reproducibility. Hence, we may work with you to refine your pull request and in some cases reject some low-quality models to be published,” the PyTorch team said in a blog post today. “With the continued growth in the number of research publications, including tens of thousands of papers now hosted on arXiv and submissions to conferences at an all time high, research reproducibility is more important than ever.”

Accepted models will be shared on the PyTorch Hub website.

At launch, PyTorch Hub comes with access to roughly 20 pretrained versions of Google’s BERT, WaveGlow and Tacotron 2 from Nvidia, and the Generative Pre-Training (GPT) for language understanding from Hugging Face. There’s also a number of audio and generative models as well as a number of computer vision models trained using the ImageNet database.

Also today, another popular machine learning framework, TensorFlow, introduced TensorFlow.Text, a library for preprocessing language understanding AI models based on the recently introduced RuggedTensor.

The news comes at the start of the International Conference on Machine Learning (ICML) in Long Beach, California. For the first time this year, ICML encouraged researchers to submit code alongside their research in order to prove results. As a result, about 36% of submitted papers and 67% of accepted papers shared their code.

Researchers associated with an academic university were far more likely to share code than those associated with a corporation or business. Ninety percent of work submitted by academia included code and 27.4% from industry included code.

“We hope future program chairs will continue and improve on the process, and the community will move towards a culture of timely code release and improved reproducibility,” according to a Medium post by Kamalika Chaudhuri and Ruslan Salakhutdinov that shares the results of the ICML Code-at-Submit-Time Experiment.

The 2018 AI Index report found that ICML was one of the most highly attended annual AI conferences, amid steady growth of the number of research papers created by government, academic, and corporate researchers.

In other PyTorch news, last month PyTorch 1.1 was released with TensorBoard support for machine learning training visualizations and an improved JIT compiler.