Google's Cloud TPUs now better support PyTorch

In 2018, Google introduced accelerated linear algebra (XLA), an optimizing compiler that speeds up machine learning models' operations by combining what used to be multiple kernels into one. (In this context, "kernels" refer to classes of algorithms for pattern analysis.) While XLA supports processor and graphics card hardware, it also runs on Google's proprietary tensor processing units (TPUs) and was instrumental in bringing TPU support to Facebook's PyTorch AI and machine learning framework. As of today, PyTorch/XLA support for Cloud TPUs -- Google's managed TPU service -- is now generally available, enabling PyTorch users to take advantage of TPUs using first-party integrations.

Google's TPUs are application-specific integrated circuits (ASICs) developed specifically to accelerate AI. They're liquid-cooled and designed to slot into server racks; deliver up to 100 petaflops of compute; and power Google products like Google Search, Google Photos, Google Translate, Google Assistant, Gmail, and Google Cloud AI APIs. Google announced the third generation at its annual I/O developer conference in 2018 and in July took the wraps off its successor, which is in the research stage.

Google and Facebook say PyTorch/XLA -- a Python package that uses XLA to connect PyTorch and TPUs -- represents two years of work. According to the companies, PyTorch/XLA runs most standard PyTorch programs with minimal modifications, falling back to processors to execute operations unsupported by TPUs. With the help of the reports PyTorch/XLA generates, PyTorch developers can find bottlenecks and adapt programs to run on Cloud TPUs.

Google says the Allen Institute for AI recently used PyTorch/XLA on Cloud TPUs across several projects, including one exploring how to add a visual component to language models to improve their understanding capabilities.

Alongside PyTorch/XLA, Google and Facebook today debuted tools to facilitate continuous AI model testing, which they say they've helped the PyTorch Lightning and Hugging Face teams use with Cloud TPUs. Google and Facebook also released a new image -- Deep Learning VM -- that has PyTorch/XLA preinstalled, along with PyTorch 1.6.

PyTorch, which Facebook publicly released in October 2016, is an open source library based on Torch, a scientific computing framework and script language that is in turn based on the Lua programming language. While TensorFlow has been around slightly longer (since November 2015), PyTorch continues to see rapid uptake in the data science and developer communities. Facebook recently revealed that in 2019 the number of contributors to the platform grew more than 50% year-over-year to nearly 1,200. Analysis conducted by the Gradient found that every major AI conference in 2019 has had a majority of papers implemented in PyTorch. And O'Reilly noted that PyTorch citations in papers grew by more than 194% in the first half of 2019 alone.

Unsurprisingly, a number of leading machine learning software projects are built on top of PyTorch, including Uber's Pyro and HuggingFace's Transformers. Software developer Preferred Networks joined the ranks recently with a pledge to move from AI framework Chainer to PyTorch in the near future.

More