Kubernetes: The key ingredient IT needs to accelerate today’s data science

Presented by Domino Data Lab

At organizations everywhere, there has never been a greater need to draw meaningful insights from data. This phenomenon has catapulted data scientists into mission-critical roles, and it is even empowering “citizen data scientists” -- people at the departmental level who may not have formal data science training -- to work with new data-intensive tools. The good news is that platform technology that can serve effective data science is easier than ever for IT departments to deploy and manage. Meanwhile, many IT departments are realizing that it is essential for them to facilitate Kubernetes-native platforms serving this purpose.

Kubernetes’ strength as an open source container orchestration system is well documented. The project began at Google and is now maintained by the Cloud Native Computing Foundation. It has become a key solution for organizations that want to compete in the cloud-native arena, is essential for innovation in multi-cloud and hybrid cloud environments, and is increasingly part of the fabric that data scientists need to run optimal applications in the most powerful ways. Now, more than ever, CIOs and IT leaders need to know about Kubernetes, as well as the ecosystem of data science tools and applications surrounding it.

Data science’s scope has expanded

Not long ago, data scientists worked with a small handful of common applications that were nowhere near as powerful as today’s. They typically ran these common applications in very common environments on very common types of hardware. Now, though, data scientists need applications deployed in varying types of environments, they need to leverage containerized applications, and they need to scale the applications they use more than ever. In these areas, Kubernetes shines, and it is particularly effective when integrated at the platform level.

Today’s IT leaders are challenged to centralize data science infrastructure in a way that will increase governance without constraining data scientists’ freedom and flexibility. Failure to do so encourages a “wild west” environment featuring siloed, inconsistent technologies sprinkled across the enterprise, operating beyond IT’s purview and hindering the business’s opportunity to drive value from its data science investment. In many cases, these “shadow implementations” are spun up in inefficient ways or ways in which applications cannot scale for the benefit of others. Kubernetes at the platform level can weave the tools that data scientists need together with optimized technology and cloud infrastructure, all under the purview of IT.

With Kubernetes integrated at the platform level, data scientists can benefit from a self-service environment, allowing them to:

easily use provisioned infrastructure that they need for today’s powerful applications
spin up workspaces using tools and applications that work for them
experiment with new tools and applications across elastic compute resources

More toys in the attic than ever: IT must facilitate experimentation

The data scientist’s need for effective experimentation is worth focusing on. In order to keep up with the latest techniques and applications, data scientists actively experiment and the pace of their experimentation is increasing as the open source ecosystem of applications expands. Just as IT departments have historically kept a close eye on open source platforms and applications that become entrenched at organizations, they should focus on the ones that data scientists are increasingly adopting -- and facilitate the ideal platforms for leveraging these applications.

Moreover, not only has the field of data science been on a pronounced collision course with the fields of machine learning and artificial intelligence, but Kubernetes has too. Kubernetes has a tremendous amount to offer data scientists who want to explore deriving business insights with the help of machine learning and AI. It can fluidly orchestrate applications that bridge the gaps between these fields and data science. In particular, IT departments are focused on delivering reproducible, scalable, inexpensive compute solutions for enabling machine learning and AI. Many of them should understand that with Kubernetes running natively at the platform level, they command advantages in this effort.

An example: Kubernetes serves Spark for an open source bridge

Consider this specific example of how a platform running Kubernetes natively can make all the difference for a data scientist: Let’s say the scientist is closely monitoring the emergence of open source data science tools, and wants to leverage Apache Spark as an open source analytics engine for big data tasks. After it all, it features built-in modules for streaming, SQL, machine learning and graph processing, and is a free, open resource. A Kubernetes-native platform can provide access to Spark’s distributed compute paradigm without the need to set up dedicated, complex clusters. In this particular use case, an open orchestration tool (Kubernetes) provides the on-ramp to one of the open source darlings of data science (Spark) and the facilitator is the IT department that provided a Kubernetes-native platform.

In this fashion, data science-aware IT leaders will increasingly help their organizations move data science from the periphery of business to its core, providing unfettered extensibility across the latest applications and platforms. At the center of all this lies Kubernetes, a powerful orchestration framework that can bridge the gaps that have existed between data scientists and IT.

Nick Elprin is CEO and Co-founder of Domino Data Lab.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.