Analysts and engineers use workflows to automate manual processes, saving time and reducing the possibility of errors. These workflows and the mechanisms that run them, often critical pieces of infrastructure, range from ad-hoc scripts to full-featured frameworks, the management of which can be time-intensive and error-prone.
The Google Cloud team wants to solve this problem with a single managed solution at the platform level. Cloud Composer and Airflow currently support BigQuery, Cloud Dataflow, Cloud Dataproc, Cloud Datastore, Cloud Storage, and Cloud Pub/Sub. Pricing for Cloud Composer is consumption-based, so you pay for what you use, as measured by vCPU/hour, GB/month, and GB transferred/month — there are multiple pricing units because Cloud Composer uses several GCP products as building blocks.
Here is Google’s justification for Cloud Composer:
When creating this workflow, did the author use standard tools and save time by reusing previously developed code from other workflows? Do other people on the team or in the organization know this workflow exists and how it works? Is it easy for everyone to understand the state of this workflow and to investigate any problems when they occur? Will workflow authors all easily or immediately know the APIs needed to create rich workflows? Without a common workflow language and system, the answer to these questions is most frequently “no.”
Considering these workflows can be mission-critical, we believe it should be easy to answer “yes” to all of these questions. Anyone from an analyst to an experienced software developer should be able to author and manage workflows in a way that saves time and reduces risk.
Google chose Apache Airflow as the base of Cloud Composer because it is an open source project. Additionally, Airflow has an active and diverse developer community, is based on Python with support for custom plugins, includes operators for many clouds and common technologies, features a web user interface and command-line tooling, provides support for multi-cloud and hybrid cloud orchestration, and has been used in production settings by companies large and small.
Cloud Composer is meant to leverage the Google Cloud Platform and offer the best of Airflow without the overhead of installing and managing Airflow yourself. This initial beta release includes the following:
- Client tooling, including the Google Developer Console and Cloud SDK
- Easy and controlled access to the Airflow web UI through Cloud Identity-Aware Proxy
- Streamlined Airflow runtime and environment configuration, such as plugin support
- Stackdriver logging and monitoring
- Identity access management (IAM)
- Simplified DAG (workflow) management
- Python (PyPi) package management
As for features in the pipeline, Google listed the inclusion of additional Google Cloud regions, Airflow and Python version selection, and autoscaling.
If you are new to Apache Airflow, Google recommends starting with the Airflow DAG tutorial while using the Airflow API reference and Airflow GCP documentation. You may also want to check out the Cloud Composer documentation and release notes as the service evolves.