Software company Cloudera today announced a slew of machine learning product updates at the Strata Data London conference: Cloudera Data Science Workbench 1.4, Cloudera Altus Data Engineering on Microsoft Azure, and Cloudera Enterprise 6.0. All three are focused on facilitating collaboration among data teams, CEO Tom Reilly said.
“We believe data can make what is impossible today possible tomorrow. With enhanced capabilities in machine learning, analytics, and cloud, the new software products and cloud services we are announcing will enable our customers to more rapidly gain competitive advantages in the data economy,” Reilly said in a statement. “These enhancements demonstrate Cloudera’s commitment to market-leading innovations that empower enterprises to securely transform complex data into clear and actionable insights to propel their digital transformation.”
Cloudera Altus Data Engineering on Azure went live yesterday with support for Apache Spark, Apache Hive, Hive on Spark, and MapReduce 2. Cloud Enterprise 6.0 and Altus Analytic DB are available in beta today, while Data Science Workbench 1.4 is expected to launch this summer.
Data Science Workbench allows data science teams to build, run, train, compare, and implement machine learning models on a single platform. Version 1.4 features an improved toolkit for running and tracking experiments and a one-click tool that allows users to deploy models as Representational State Transfer (REST) APIs for networked applications.
Cloudera Atlus is a bit more cloud-centric; Cloudera claims it is the first “multi-cloud, multi-function” platform on a service. The products under its umbrella include Data Engineering for Azure, which grants processing jobs read and write access to Microsoft Azure Data Lake Store (ADLS), and Altus Analytic DB, a “data warehouse” service that delivers database analytics in SQL, Python, R, and other formats via Altus SDX. That is in addition to the Cloudera Altus software development kit (SDK), which allows programmatic access to Java and an automated workload performance monitor that flags potential problems.
Last, but not least, is Cloudera Enterprise, a platform for machine learning and analytics applications. The newest iteration (version 6.0) introduces GPU support and Apache Hive data warehouse optimizations that “significantly accelerate machine learning and data engineering applications” as compared to the previous release. It also offers Apache Solr 7.0 (with support for nested data types and JSON facets), Kafka 1.0, and Spark 2.2 as fully native components. Cloudera claims that even with as many as 2,500 nodes in a single Cloudera Manager 6.0 interface cluster, machine learning on the platform has the potential to be up to 10 times faster. Analytics workloads leveraging Apache Hive 2.0 can expect up to 80 percent better performance.
“We’re thrilled to be launching new capabilities in Cloudera Data Science Workbench that accelerate everyday workflows for data scientists, including experiment management and model deployment, with a seamless experience that also keeps data secure and under governance,” Hilary Mason, general manager of machine learning at Cloudera, said in a statement.