MissingLink.ai simplifies AI data management and experimentation

The amount of data generated each day boggles the mind. IDC forecasts the total will grow to 5.2 zettabytes in 2025, and it's accelerating exponentially -- 90 percent of the world's data was generated in the past two years alone. (For point of reference, a zettabyte is the equivalent of 250 billion DVDs.)

It's a lot for anyone to wrap their head around -- particularly data scientists tasked with leveraging that data to train, validate, and test machine learning systems. To making wrangling it a little easier, software engineer Yosi Taguri two years ago teamed up with three colleagues -- Shay Erlichmen, Joe Salomon, and Rahav Lussato -- to found MissingLink.ai. Today, it launched publicly.

“We’re at an incredible tipping point with all the data we need to solve really important problems, like saving lives through cancer detection and providing safer, smarter driving on the streets," Taguri said. "But wading through all that data to find the meaning from it is tough and requires too much manpower. MissingLink allows every engineer to build complex AI machines in a way that wasn’t possible before."

To that end, MissingLink.ai offers end-to-end management and deployment tools that simplify coding and model training processes. It supports popular machine learning frameworks such as Google's TensorFlow, Facebook's Caffe2, PyTorch, and Keras, and instantly syncs changes to data, obviating the need to copy files manually. As for experiments, which the system automatically delegates to available compute resources and runs in parallel, they take just three lines of code to set up.

"We’re taking away a lot of the grunt work so that they can focus on the bigger picture issues,” Taguri said.

Among the highlights of the suite is a robust data management engine that Taguri characterized as "version aware." In essence, it maps changes in databases over time, allowing data engineers to run queries against specific versions for model training and comparison. It additionally streams data and caches it locally, using the CPU to copy while experiments run on the GPU.

Another of Missing.AI's headliners is its visual dashboards, which collate ongoing experiments in a list view containing start times, the machines or cloud instances on which each test is running, total runtime and progress, and other useful metrics. Once an experiment is finished, the results -- including the source code, visualizations, and resources -- are recorded automatically for posterity.

"Deep learning costs a lot of money," Taguri said. "Companies learn that it's not that easy. They're basically stuck doing DevOps [work] -- moving data, tracking experiments, and trying to get machines and GPUs up and running."

Taguri claims that one of the company's customers saw a 20 times boost in productivity.

MissingLink.ai offers a free plan with 1GB of managed data, one managed resource, and one managed organization. Its least expensive paid plan costs $120 per month and increases the storage limits to 100GB and number of managed resources to five.

"[One of the] core principles we keep in mind is that we shouldn't have to educate data scientists about how to do deep learning -- we should seamlessly integrate into [their] workflow," Taguri said.

More