Reproducibility puts the “science” in the computer science of A — it’s how researchers can prove their AI systems are robust and reliable. To support reproducibility for AI models, Facebook today announced the beta release of PyTorch Hub, an API and workflow for research reproducibility and support.
PyTorch Hub can quickly publish pretrained models to a GitHub repository by adding a
hubconf.pyfile and publishing models using a GitHub pull request. PyTorch Hub comes with support for models in Google Colab and PapersWithCode.
“Our goal is to curate high-quality, easily reproducible, maximally beneficial models for research reproducibility. Hence, we may work with you to refine your pull request and in some cases reject some low-quality models to be published,” the PyTorch team said in a blog post today. “With the continued growth in the number of research publications, including tens of thousands of papers now hosted on arXiv, and submissions to conferences at an all-time high, research reproducibility is more important than ever.”
Accepted models will be shared on the PyTorch Hub website.
At launch, PyTorch Hub comes with access to roughly 20 pretrained versions of Google’s BERT, WaveGlow, and Tacotron 2 from Nvidia, and the Generative Pre-Training (GPT) for language understanding from Hugging Face. A number of audio and generative models are also included, as well as computer vision models trained using the ImageNet database.
Also today, popular machine learning framework TensorFlow introduced TensorFlow.Text, a library for preprocessing language understanding AI models based on the recently introduced RuggedTensor.
The news comes at the start of the International Conference on Machine Learning (ICML) in Long Beach, California. For the first time this year, ICML encouraged researchers to submit code alongside their research in order to prove results. As a result, about 36% of submitted papers and 67% of accepted papers included code.
Researchers associated with an academic university were far more likely to share code than those working with a corporation or business. Ninety percent of work submitted by academic researchers included code, compared with only 27.4% of work from the tech industry.
“We hope future program chairs will continue and improve on the process and the community will move toward a culture of timely code release and improved reproducibility,” Kamalika Chaudhuri and Ruslan Salakhutdinov wrote in a Medium post that shares the results of the ICML code-at-submit-time experiment.
The 2018 AI Index report found that ICML was one of the most highly attended annual AI conferences, amid steady growth in the number of research papers created by government, academic, and corporate researchers.
In other news, last month PyTorch 1.1 was released with TensorBoard support for ML training visualizations and an improved JIT compiler.