Google today announced the beta launch of Cloud AI Platform Pipelines, a service designed to deploy robust, repeatable AI pipelines along with monitoring, auditing, version tracking, and reproducibility in the cloud. Google’s pitching it as a way to deliver an “easy to install” secure execution environment for machine learning workflows, which could reduce the amount of time enterprises spend bringing products to production.
“When you’re just prototyping a machine learning model in a notebook, it can seem fairly straightforward. But when you need to start paying attention to the other pieces required to make a [machine learning] workflow sustainable and scalable, things become more complex,” wrote Google product manager Anusha Ramesh and staff developer advocate Amy Unruh in a blog post. “A machine learning workflow can involve many steps with dependencies on each other, from data preparation and analysis, to training, to evaluation, to deployment, and more. It’s hard to compose and track these processes in an ad-hoc manner — for example, in a set of notebooks or scripts — and things like auditing and reproducibility become increasingly problematic.”
AI Platform Pipelines has two major parts: (1) the infrastructure for deploying and running structured AI workflows that are integrated with Google Cloud Platform services and (2) the pipeline tools for building, debugging, and sharing pipelines and components. The service runs on a Google Kubernetes cluster that’s automatically created as a part of the installation process, and it’s accessible via the Cloud AI Platform dashboard. With AI Platform Pipelines, developers specify a pipeline using the Kubeflow Pipelines software development kit (SDK), or by customizing the TensorFlow Extended (TFX) Pipeline template with the TFX SDK. This SDK compiles the pipeline and submits it to the Pipelines REST API server, which stores and schedules the pipeline for execution.
AI Pipelines uses the open source Argo workflow engine to run the pipeline and has additional microservices to record metadata, handle components IO, and schedule pipeline runs. Pipeline steps are executed as individual isolated pods in a cluster and each component can leverage Google Cloud services such as Dataflow, AI Platform Training and Prediction, BigQuery, and others. Meanwhile, the pipelines can contain steps that perform graphics card and tensor processing unit computation in the cluster, directly leveraging features like autoscaling and node auto-provisioning.
AI Platform Pipeline runs include automatic metadata tracking using ML Metadata, a library for recording and retrieving metadata associated with machine learning developer and data scientist workflows. Automatic metadata tracking logs the artifacts used in each pipeline step, pipeline parameters, and the linkage across the input/output artifacts, as well as the pipeline steps that created and consumed them.
In addition, AI Platform Pipelines supports pipeline versioning, which allows developers to upload multiple versions of the same pipeline and group them in the UI, as well as automatic artifact and lineage tracking. Native artifact tracking enables the tracking of things like models, data statistics, model evaluation metrics, and many more. And lineage tracking shows the history and versions of your models, data, and more.
Google says that in the near future, AI Platform Pipelines will gain multi-user isolation, which will let each person accessing the Pipelines cluster control who can access their pipelines and other resources. Other forthcoming features include workload identity to support transparent access to Google Cloud Services; a UI-based setup of off-cluster storage of backend data, including metadata, server data, job history, and metrics; simpler cluster upgrades; and more templates for authoring workflows.