At its Google Cloud Next conference in San Francisco back in March, Google unveiled Cloud Dataprep, a service that lets companies clean their structured and unstructured datasets for analysis in, for example, Google’s BigQuery, or even for use in training machine learning models.
Over the past six months, Cloud Dataprep has been in private beta, but Google is now officially graduating the service to public beta for anyone to use.
Some reports indicate that analysts and data scientists can spend up to 80 percent of their time cleaning and preparing raw data for analysis. This is where Dataprep comes into play, as it can automatically detect data type, schema, and even where there is mismatched or missing data.
A key facet of Dataprep is the visual layout, which makes it easier for people who aren’t data engineers to alter or add to their datasets.
The software is actually an embedded version of the Wrangler enterprise app from Trifacta, a well-funded startup that offers software for cleaning up messy data. Indeed, Dataprep was built in collaboration with Trifacta.
“Cloud Dataprep also has intelligence built-in for understanding and automatically operationalizing your particular usage patterns, making data preparation even faster and less prone to user error,” noted Google product manager Eric Anderson. “The overall result is more productive, efficient, and powerful data analytics pipelines, leading to faster time-to-insight.”
It’s worth noting here that in addition to BigQuery, Cloud Dataprep integrates with other services on Google Cloud, including Cloud Storage, Cloud Dataflow, and the Cloud Machine Learning Engine.
Cloud Dataprep is available for anyone to use from today.