Machine learning groups form Consortium for Python Data API Standards to reduce fragmentation

Deep learning framework Apache MXNet and Open Neural Network Exchange (ONNX) today launched the Consortium for Python Data API Standards to improve interoperability for machine learning practitioners and data scientists using any framework, library, or tool from the Python ecosystem.

ONNX itself was formed by Facebook and Microsoft in 2017 to encourage interoperability between frameworks and tools. Today, ONNX includes nearly 40 organizations with influence in AI and data science, including AWS, Baidu, and IBM, along with hardware makers like Arm, Intel, and Qualcomm.

The new consortium, which will develop standards for dataframes and arrays or tensors, hopes to address the fragmentation that has affected the data ecosystem in recent years. The Python programming language is used for Python dataframes like Pandas, PySpark, and Apache Arrow. Other major frameworks include TensorFlow, PyTorch, and NumPy. The consortium will not include PyTorch, one of the most popular machine learning frameworks in use today, a Facebook company spokesperson told VentureBeat in an interview.

"Currently, array and dataframe libraries all have similar APIs, but with enough differences that using them interchangeably isn't really possible," consortium members said in a blog post today. "We aim to grow this consortium into an organization where cross-project and cross-ecosystem alignment on APIs, data exchange mechanisms, and other such topics happens. These topics require coordination and communication to a much larger extent than they require technical innovation. We aim to facilitate the former while leaving the innovating to current and future individual libraries."

The consortium's first priority is establishing a working group and developing an initial standard. The group will then request feedback from array and dataframe library maintainers and iterate as needed before making a version of the standard available for use. The group is also releasing tools for comparing arrays or tensors and tracking some of the primary functions of a dataframe library. The first feedback session begins next month.

More