Yahoo is announcing today that it’s open-sourcing TensorFlowOnSpark, a piece of software it has created to make the Google-initiated TensorFlow open-source framework for deep learning compatible with its data sets that sit inside Spark clusters, which some organizations maintain for processing lots of different kinds of data. The code is available now under an Apache 2.0 license on GitHub.
Deep learning — which typically involves training artificial neural networks on lots of data, like photos, and then directing the neural networks to make their best guesses about new data — continues to be popular at web companies and at startups as well.
Almost a year ago, Yahoo open-sourced CaffeOnSpark, which brought Spark support to the Caffe open-source deep learning framework. Now Yahoo is doing the same thing but with a different framework, and one from a company it has long competed with in web search.
Today’s move comes a few months after TensorFlow got support for the Hadoop Distributed File System (HDFS), which can be a data source for Spark, as Lee Yang, Jun Shi, Bobbie Chern, and Andy Feng of Yahoo’s Big ML team point out in a blog post.
The team evaluated alternatives like SparkNet and TensorFrame but ultimately chose to build their own. Their software works with Spark tools like SparkSQL, MLlib, and Python notebooks hooked up to Spark clusters, but it will also work with Hadoop, Yang, Shi, Chern, and Feng wrote.
And migrating from TensorFlow to TensorFlowOnSpark requires changing fewer than 10 lines of code. “Many developers at Yahoo who use TensorFlow have easily migrated TensorFlow programs for execution with TensorFlowOnSpark,” they wrote.