FeatureByte launched by Datarobot vets to advance AI feature engineering

Artificial intelligence (AI) offers a lot of promise to enterprises to help optimize processes and improve operational efficiency. The challenge for many, though, is getting data in the right shape and with the right processes to actually be able to benefit from AI.

That's the challenge that the two cofounders of FeatureByte, Razi Raziuddin and Xavier Conort, noticed time and again while working at enterprise AI platform vendor Datarobot. Raziuddin worked for over five years at Datarobot including a stint as the senior VP of AI services, while Conort was the chief data scientist at Datarobot for over six years.

"One of the challenges that we've seen is that AI is not just about building models, which is really the focus of not just Datarobot, but pretty much the entire AI and ML [machine learning] tooling space," Raziuddin told VentureBeat. "The key challenge that still remains and we call it the weakest link in AI development, is just the management, preparation and deployment of data in production."

Borrowing data prep from data analytics to improve AI development

Raziuddin explained that feature engineering is a combination of several activities designed to help optimize, organize and monitor data so that it can effectively be used to help build features for an AI model. Feature engineering includes data preparation and making sure that data is in the correct format and structure to be used for machine learning.

In the data analytics world, the process of data preparation isn’t a new discipline; there are ETL (extract, transform and load) tools that can take data from an operational system and then bring them into a data warehouse where analysis is performed. However, that same approach hasn't been available for AI workloads, according to Raziuddin. He said that data preparation for AI requires a purpose-built approach in order to help automate a machine learning (ML) pipeline.

In order to do really good feature engineering and feature management, Raziuddin said that a combination of several critical skills is needed. The first is data science, with the ability to understand the structure and format of data. The second critical skill is understanding the domain in which the data is collected. Different data domains and industry use cases will have different data preparation concerns, such as data collected for a healthcare deployment will be very different from that used for a retail business.

With a thorough understanding of the data, it's possible to build features in AI that will be optimized to make the best use of the data.

Automating feature engineering for AI

Getting data in the right shape for AI has often involved the need for a data engineering team in addition to one or more data scientists.

What FeatureByte is aiming to do is to help solve that pain point and provide a streamlined process for having data pipelines available for data scientists to use for building features for their AI models. Raziuddin said that his company is really all about removing friction from the process and making sure that data scientists can do as much as possible within a single tool, without having to rely on a data engineering team.

The company's technology is still in development, though the company has some clear goals for what it should be able to do. Today, it announced that it has raised $5.7 million in a seed round of funding. Raziuddin said the platform will use the funding to help embed domain knowledge and data engineering expertise to accelerate the process of feature engineering.

FeatureByte's platform will be cloud-based and will be able to leverage existing data resources, including cloud data warehouses and data lake technologies such as Snowflake and Databricks.

"With the number of AI models increasing, the number of data sources that are available to build these models is going up at a faster pace than most teams are able to handle," Raziuddin said. "So unless there is tooling and unless that process is automated and streamlined, it's not something that companies are going to be able to keep up with."

The seed funding was led by Glasswing Ventures and Tola Capital.

Borrowing data prep from data analytics to improve AI development

Automating feature engineering for AI

More