We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Informatica today announced it has integrated its Cloud Data Integration engine based on the Apache Spark in-memory computing framework with graphical processor units (GPUs) from Nvidia.
The alliance makes the processing horsepower of GPUs accessible via a set of visual tools to a wide range of subject matter experts and data scientists without requiring them to have coding skills, said Rik Tamm-Daniels, Informatica’s vice president of strategic ecosystems and technology.
While GPUs are much faster when it comes to processing data, no one wants to write code to invoke a GPU, Tamm-Daniels noted. But the number of individuals who need to build datasets keeps increasing. According to Gartner, on average 41% of employees outside of IT are in some way customizing datasets.
Informatica is employing NVIDIA RAPIDS Accelerator software for Apache Spark to integrate Cloud Data Integration engine with Nvidia GPUs. That effort makes it possible to leverage the inherent parallel processing capabilities of an Nvidia GPU to process data as much as 5 times faster than an x86 processor, Tamm-Daniels claimed.
The Cloud Data Integration engine automatically provides all the mappings needed to invoke either class of processor running on Amazon Web Services (AWS). The service will soon be available on Microsoft Azure and Google Cloud Platform (GCP), Tamm-Daniels added. In many cases, even data scientists who have the coding skills would prefer to take advantage of a serverless approach.
Ultimately, data processing and management is being democratized, thanks to the rise of serverless computing frameworks based on event-driven computing architectures, Tamm-Daniels added. Like other cloud-native technologies, this approach is designed to scale up and down as required. An organization can lower the total cost of data-intensive compute loads by as much as 72% because there is no need to provision IT infrastructure until it’s required, Tamm-Daniels noted.
As a consequence, serverless computing frameworks will soon become the primary way large amounts of data are processed on demand, Tamm-Daniels added. “Serverless is preferred,” he said.
There may even come a day when cloud service providers running the Cloud Data Integration engine may find themselves undercutting each other to provide the lowest cost from processing various datasets at varying times of day.
Thus far, the biggest driver of GPU adoption in the enterprise has been AI workloads. However, a wide range of data analytics applications involving, for example, clinical trials, would benefit from reducing the amount of time it takes to process massive amounts of data.
Longer term, it’s not clear to what degree internal IT teams need, or even want, to be involved in managing data processing jobs beyond helping initially provision the IT environment. There is no shortage of other tasks that IT professionals could focus on if they spent less time writing code to enable data to be processed on one platform versus another. At the same time, knowledge workers of all skill levels are, for better or worse, finding ways to process data that doesn’t require the direct intervention of an internal IT team.
Regardless of who processes that data, how it is processed is about to fundamentally change. The days when both IT administrator and subject matter experts who are among the most highly paid in any organization spend hours waiting for a job to finish is coming to an end as organizations start to appreciate how much more costly labor is compared to compute engine in the cloud.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.