Elevate your enterprise data technology and strategy at Transform 2021.


Cloudera and Nvidia announced a collaboration that will allow organizations to use GPUs in more areas across the AI development lifecycle.

Cloudera will integrate its Cloudera Data Platform with Nvidia’s accelerated Apache Spark 3.0 libraries. The integration will make it easier to add machine learning workflows to processes and create architectures without requiring GPU customization. Enterprises will be able to make changes to their data science workflows without having to also update the Nvidia integration manually.

GPUs have shown tremendous promise in enhancing the data science side of AI development, enabling enterprises to run some types of workloads on top of GPUs. However, analytics often involve processes that span multiple teams, forcing enterprises to invest in customizing GPU integrations for those use cases.

Gartner has predicted that creating new architecture patterns that help operationalize data science and ML pipelines will be one of the major trends in 2021.

Benefits to accelerating GPUs

The partnership will allow enterprises to use GPUs across modern data workflows that span data preparation, data science, and analytics tasks. The typical workflow includes many steps including data ingestion, data curation, data pipeline automation, data science exploration, model development, testing, deployment, model monitoring and retraining, and delivery into the business. Cloudera has been busy in making these processes and the handoffs between them much easier over the last year.

The Apache Spark 3.0 libraries are accelerated using Nvidia’s RAPIDS platform, which will dramatically accelerate much of the boring prep work required to bring new machine learning models into production. For example, the US Internal Revenue Service is already seeing a threefold improvement in data science workflows for fraud detection, said Joe Ansaldi, IRS technical branch chief for the Research Applied Analytics & Statistics Division, in a statement.

Speeding up data preparation tasks and training models faster will save on infrastructure costs as well. GPU-accelerated Apache Spark 3 runs natively on CDP and can plug into high performance compute tools, Cloudera said.

Comparison of CPU and GPU workloads

Above: Comparing the CPU and GPU powered workflows.

Image Credit: Cloudera

Cloudera’s data portfolio

Cloudera was a trailblazer in the development of data lakes built on top of the Hadoop platform. Cloudera merged with Hortonworks, another Hadoop vendor, in 2018 and combined the technologies into a modern architecture called the Cloudera Data Platform (CDP). At the time, many speculated this spelled the end of Hadoop data warehouses, but Cloudera has continued to innovate and extend Hadoop into a more nimble workflow.

Cloudera added Applied ML Prototypes (AMPs), a framework for packaging AI and ML models for data scientists, to CDP earlier this year. AMPs allow teams to take the guesswork out of ML projects with prebuilt business application templates for specific use cases, and they often run on Nvidia GPU hardware. Cloudera Data Engineering (CDE) streamlines the data engineering and prep work at the start of a project. This solved common problems data engineers face, such as scheduling and orchestration of complex data, troubleshooting and performance tuning tools for data flows, and improving collaboration with analytic and data science teams.

The RAPIDS Accelerator for Apache Spark will be available in CDP Private Cloud this summer. Nvidia and Cloudera will roll out additional accelerated offerings in CDP over time, starting with Accelerated Deep Learning and Machine Learning in CDP Public Cloud in May. “This means that no matter where customers require these GPUs (from on-prem to public cloud, to hybrid cloud and beyond), they’ll be able to leverage best-in-class GPUs out of the box,” said Santiago Giraldo, Cloudera director of product marketing for data engineering and machine learning.

VentureBeat

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member