Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.


What is data-warehouse-as-a-service (DWaaS)?

With the volume of enterprise data growing at a breathtaking pace (e.g., IDC projects a 23% CAGR to 175 zettabytes by 2025), the adoption of modern data infrastructure has become inevitable. Companies of all sizes and sectors are inevitably adopting more effective data solutions.

These organizations need to consolidate business data from multiple source systems for historical and trend analysis. This is where data warehouses come in, enabling firms to keep organized and clean business data in an aggregate summary form (primarily “structured data” that fits into rows and columns).

When the requirement is to handle structured data for a predefined business purpose, a data warehouse is seen as the go-to choice. However, building and maintaining a data warehouse is quite a task. With the volume of data growing continuously, organizations must scale the storage and compute elements of their on-premise warehouse accordingly. This not only requires a considerable investment, but also creates administrative overhead — with a team always keeping an eye on the whole infrastructure to keep it up and running while ensuring security and compliance.

The challenge, which acts as a major roadblock for small companies, is being addressed with a cloud-based data-warehouse-as-a-service or a DWaaS model. In itl, a service provider is responsible for setting up, maintaining, securing and upgrading a data warehouse – complete with the handling of all associated software and hardware stacks. The customer only has to worry about plugging in the data sources they want to connect to the warehouse and paying for the managed service.

Key functions of a DWaaS offering

When an enterprise opts for a data-warehouse-as-a-service offering, it will receive a few key services from the provider. However, it may opt for more inclusive elements as well. The list of basic services includes the following:

Data warehouse design and development

A company providing DWaaS services first configures a custom data warehouse architecture for the customer by looking at its unique business requirements, existing data management strategy, data sources and quality practices. Once the custom framework is ready and future-proofed (for aspects such as scalability), it works toward implementing it by selecting the most suitable hardware and software systems and processes. 

Integration with sources

After configuring the custom data warehouse, the provider works toward integrating it with all existing data sources, such as the transactional systems of the customer. Depending on the case, the vendor could leverage leading pipeline technologies or custom code to ensure high-integrity transfer of data to the warehouse. Some providers also integrate the warehouse with existing analytical solutions for in-house analytics.

Data cleaning and migration

Once integrated, the information from the connected data sources is merged, cleansed, enriched and regularly tested for accuracy, completeness and compliance with the core data model. The cleansed information is transferred to the cloud platform chosen by the customer, but some providers also support hybrid strategies, whereby some data is maintained on the customer’s premises and some in the cloud.

Support

Once the warehouse is up and running, the service provider performs the housekeeping of maintaining data quality, adding and removing sources and checking performance as well as extract, transform and load (ETL) correctness from time to time. The provider ensures that the entire service – from the data model to infrastructure – is built in compliance with privacy, security and governance standards.

Continuous evolution

While maintaining the data warehouse, the provider keeps an eye on changing business needs and data sources to make sure the entire data environment receives regular upgrades, whether in software, compute or storage.

Top data-warehouse-as-a-service solution providers in 2022

With DWaaS solutions, a number of vendors provide the benefits of data warehousing without requiring the customers to bear the load of setup and maintenance. However, according to customer feedback provided to G2 and Gartner, only a few players have made a strong-enough mark to be categorized as leaders.

Snowflake Data Cloud

Operating across multiple clouds, including AWS and Azure, the Snowflake Data Cloud provides warehousing capabilities with full relational database support for both structured and semi-structured data. It separates storage, compute and cloud services into different layers, allowing them to change and scale independently. It also automates key maintenance aspects such as query caching, planning, parsing and optimization as well as update processing. Globally, more than 5000 companies use Snowflake Data Cloud to mobilize their data for artificial intelligence (AI) and analytics.

According to customer ratings, the platform meets user requirements and stands out in all categories, starting from ease of deployment, administration and used to support quality, scalability, integrations and pricing flexibility.  

Amazon Redshift

As an AWS product, Amazon Redshift provides a fully managed and scalable cloud data warehouse that allows enterprises to run complex analytical queries on terabytes to petabytes of data stored in S3 buckets. It operates by provisioning clusters of nodes, with each node providing CPU, RAM and storage for one or more databases. As warehousing needs evolve, clusters can be provisioned or de-provisioned manually in Redshift to scale up or down accordingly.

Redshift is almost at par with Snowflake but falls behind in areas like quality of end-user training and availability of third-party resources, according to user feedback on Gartner.

Google BigQuery

BigQuery is the fully managed data warehouse offering from Google. It comes with serverless architecture, supported by automatic provisioning, and built-in features such as streaming data support, machine learning and geospatial analysis. According to Google, BigQuery separates computing and storage for enhanced flexibility to scale and allows developers to use client libraries with familiar programming, including Python, Java, JavaScript, and Go, to transform and manage data. 

The solution also enables centralized management of data and compute resources with tools for identity and access management. As per G2 ratings, customers using BigQuery reported they faced problems with deployment, use and support aspects of the solution. 

IBM Db2

Like Google, IBM also provides a fully managed, elastic cloud data warehouse that delivers independent scaling of storage and compute with its IBM Db2 solution. The offering includes a highly optimized columnar data store, actionable compression and in-memory processing to accelerate analytics and machine learning. Plus, it automates maintenance tasks such as monitoring, uptime checks and backups.

The problem areas of the solution are also similar to that of Google’s BigQuery where users reported they had faced issues with the solution’s setup, deployment, use and quality of support provided. 

Microsoft Azure Synapse Analytics

Azure Synapse Analytics brings together data integration, warehousing and analytics capabilities to provide enterprises with a unified workspace to ingest, prepare, manage and serve big data for AI and business intelligence (BI) use cases. 

The solution gives data professionals the freedom to query data using either serverless or manually provisioned resources. It is also one of the leading players in the space due to near-limitless scaling of storage and compute resources, a deeply integrated SQL engine, native integrations with Power BI and Azure ML and advanced access to data controls. 

Leading enterprises, such as Walgreens, Co-op, Marks and Spencer and GE Aviation, currently use Azure Synapse Analytics. According to Gartner ratings, the problem areas here have been pricing models and customization.

Other notable players in the category are SAP, Oracle, Yellowbrick, Cloudera and Teradata. Overall, the market for DWaaS solutions is expected to grow 20% from $1.44 billion in 2020 to $4.3 billion by 2026. 

The surge, according to Mordor Intelligence, will primarily be driven by the growing interests of companies to understand the available information regarding business processes, products, customers and services to grab new business opportunities.

Read next: Google Cloud federates warehouse and lake, BI and AI