Netflix open-sources Polynote to simplify data science and machine learning workflows

Machine learning and data science development isn’t exactly a walk in the park, but Netflix hopes to streamline the arduous bits with a new freely available platform. The tech giant today announced that it has open-sourced Polynote, a multi-language programming notebook environment that integrates with Apache Spark and offers robust support for Scala, Python, and SQL.

In a blog post, Netflix said that Polynote — which has seen “substantial” adoption among its personalization and recommendation teams — was designed to enable data scientists and AI researchers to integrate Netflix’s JVM-based machine learning framework with Python machine learning and visualization libraries. It’s freely available as of today from Polynote.org and from GitHub.

“On the Netflix personalization infrastructure team, our job is to accelerate machine learning innovation by building tools that can remove pain points and allow researchers to focus on research. Polynote originated from a frustration with the shortcomings of existing notebook tools, especially with respect to their support of Scala,” said the company. “At Netflix, we have always felt strongly about sharing with the open source community and believe that Polynote has a great potential to address similar needs outside of Netflix.”

Above: Polynote’s primary interface.

Image Credit: Netflix

For the uninitiated, a notebook execution is a record of a particular piece of code run at a particular point in time and in a particular environment. It’s an ordered collection of cells, each of which can hold code or text and be modified and executed independently. Cells can be rearranged, inserted, and deleted, and they usually depend on the output of other cells in the notebook.

Polynote’s novel reproducibility feature takes cells’ positions in the notebook into account before executing them, helping prevent bad practices that make notebooks difficult to rerun from the top. Additionally, it packs features akin to an integrated development environment, including interactive autocomplete and parameter hints and in-line error highlighting, in addition to a rich text editor with support for the high-quality LaTeX typesetting system.

As its name implies, Polynote is a polyglot system, which means each cell in a notebook can be written in a different language, with variables shared between them. (The kernel, or the computational engine that executes the code, provides the available typed input values to the cell’s language interpreter, which in turn provides the resulting typed output values back to the kernel.) Furthermore, Polynote provides configuration and dependency setup saved within the notebook itself, while at the same time enabling data exploration with Matplotlib (a Python 2D plotting library) and the visualization grammar Vega.

Above: A Vega cell generated by Polynote’s plot constructor.

Image Credit: Netflix

A symbol table within Polynote provides insight into the notebook’s internal state, and a separate status area shows critical information about the execution status of the kernel. A handy configuration section lets users set dependencies for each notebook, which Polynote automatically fetches and loads locally or from repositories.

“Plenty of exciting work lies ahead,” wrote Netflix. “We are very optimistic about the potential of Polynote, and we hope to learn from the community just as much as we hope they will find value from Polynote.”

The open-sourcing of Polynote comes after the release of open source machine learning tools like Uber’s Ludwig, a toolbox built on top of Google’s TensorFlow machine learning framework. Facebook, for its part, recently made available Pythia, a deep learning framework for image and language models. This was around the same time Google launched a library for language AI models dubbed TensorFlow.Text.