Are Snowflake and MongoDB on a collision course?

Last week, Snowflake unveiled a number of announcements for expanding its footprint. It's embracing the data lakehouse with its new support of Apache Iceberg as an alternative to the Snowflake native table format. It’s adding transactions and it’s seeking to make developers first-class citizens in its marketplace. And with new support in Snowpark, Snowflake is showing that it's serious about accommodating Python developers with deep integration with the Anaconda portfolio of curated Python libraries.

But for this post, we want to focus on the question of whether two fast-growing cloud database providers are, in fact, going to eventually run into each other. We’ll talk about Python and data lakehouses in an upcoming post.

Snowflake vs. Mongo DB

At first glance, this question is positively absurd. In the database world, one could hardly imagine two more diametric opposites: a SQL-based analytics platform vs. an operational database that eschews SQL. Or more to the point, a provider that positions its offering as “the data cloud” and the other that is application-first, rather than data developer-first.

But under the covers both are inching into each other’s territory. Admittedly, it’s from opposing starting points and constituencies, but the meeting point will be operational analytics.

As we laid out in our commentary about MongoDB, there is compelling rationale for bringing operational data and analytics together. The guiding notion is that there is significant value for folding analytics into transaction processing, resulting in “smart transactions.” The use cases are pretty familiar. A customer goes onto an ecommerce website and while placing an order, the website responds with recommendations aimed at upselling or cross-selling the customer, or adding sweeteners to prevent a customer from churning.

The same could be said for preventive maintenance for applications involving IoT, or treatment recommendations driven by medical device readings for healthcare delivery. Clearly, Snowflake isn't targeting OLTP use cases not involving analytics – but clearly with the dividing line between transactions and analytics blurring, this does not rule out that many use cases.

For Snowflake, convergence becomes clear with a pair of announcements for broadening its functional and addressable audience footprints. The first is the new Unistore transaction system that is based on a hybrid row/column store that is separate from the cloud object store and cache that supports the analytic mother ship. Unistore will bring lightweight transaction processing to Snowflake. And the second announcement, which is all about drawing app developers to the Snowflake data cloud, is the new native application framework intended to provide developers a way to monetize their data applications on the newly rechristened Snowflake Marketplace.

By comparison, MongoDB has always been about developers, and given the recent keynotes by spokespeople like CTO Mark Porter, it’s about clearing away the speed bumps for developers to keep turning out new applications using data. However, beyond developer cheerleading keynotes last week were subtle moves to make MongoDB more hospitable to analytics. Among the highlights, the company that still vocally downplays SQL has actually written its first serious SQL query engine.

Let’s start with Unistore

So, what is Unistore? It’s a hybrid row and transaction data store that extends Snowflake’s addressable footprint to lightweight transaction applications. Snowflake has so far been fairly vague about describing the details of its engine, but they will be the first to admit that they are not about to replace Oracle or SQL Server for mission-critical enterprise applications, or cloud transaction behemoths like Amazon Aurora or Google Cloud SQL anytime soon.

Unistore might handle volumes up to thousands of transactions per second, which is hardly web scale. Instead, they are targeting use cases like maintaining a feature store for machine learning (ML) models, tracking the state of an application such as an ETL operation, or performing functions such as inventory checks. For now, Snowflake’s Unistore aspirations are modest. And when you ask Snowflake what they are targeting, they deliberately characterize it as “new applications” rather than legacy. Does that sound a little bit like MongoDB?

Not the first to go hybrid

Snowflake and MongoDB aren't the first to try collapsing analytics and transaction databases. Just this year, Google and Oracle announced cloud-based PostgreSQL and MySQL services blending the two, followed last week by NoSQL real-time database Aerospike integrating the SQL-based Starburst Trino federated query engine. Before that, we’ve also seen IBM, Oracle and MariaDB deliver hybrid platforms with row-based transaction stores paired side-by-side with in-memory columnar analytic tables. The hybrid idea actually dates back almost a decade when IBM Db2 BLU appeared.

Ideally, the desired end state would be in-line, real-time “smart transactions” that automatically trigger a simple analytics step for making a quick decision in responding to an outlier transaction. However, the reality is that while replication might be real-time, analytics are not integrated in-line with transactions to form a sort of closed-loop process. Instead, the guiding notion is about simplifying the database stack by eliminating ETL and the need for a separate data warehouse.

And, as we noted with MongoDB the other week, the last thing you want to do with an operation or transaction database is to slow it down with complex analytics. You can do analytics in these hybrid systems, but without significant workload isolation, those analytics will have to be relatively lightweight, not involving complex joins of hundreds or thousands of tables, or requiring intense compute. For that you’ll still need a dedicated data warehouse, data lakes, or data lakehouses, or highly engineered systems like Oracle Exadata.

Evolving to the data applications cloud

OK, that’s not Snowflake’s official name for it, but it might as well be.

With its Native Application Framework, Snowflake is simplifying the path for developers to onramp their applications into its marketplace. While Snowflake isn't changing its Data Cloud branding, they are changing the branding of the marketplace from the Snowflake Data Marketplace to simply Snowflake Marketplace. The framework provides not only APIs, but the capability to package the runtime so their code is protected from copying, while customers’ data remains protected because it doesn’t have to be moved out of Snowflake.

At first blush, it looks like Snowflake is seeking to get the love from the crowd that put MongoDB on the map. But a closer look is that Snowflake is appealing not to the typical JavaScript developer who works with a variable schema in a document database, but to developers who may write in various languages, but are accustomed to running their code as user-defined functions, user-defined table functions or stored procedures in a relational database. There’s a similar issue with data scientists and data engineers working in Snowpark, but with one notable exception: They have the alternative to execute their code through external functions. That, of course, prompts the debate over whether it’s more performant to run everything inside the Snowflake environment or bring in an external server – one that we’ll explore in another post.

While document-oriented developers working with JSON might perceive SQL UDFs as foreign territory, Snowflake is making one message quite clear with the Native Application Framework: As long as developers want to run their code in UDFs, they will be just as welcome to profit off their work as the data folks.

Cut to the chase

While Snowflake’s market capitalization has recently come down to earth along with the rest of the tech sector, both Snowflake and MongoDB are perceived as leading cloud-native database alternatives to hyperscalers. Both have the distinction of becoming true frenemies of AWS.

At first glance, it appears that the stars are aligning that both will start seeing each other in their respective radar screens. Both are tiptoeing into each other’s domains. Snowflake has added a transaction store and is actively courting developers while MongoDB has quietly released its first credible SQL engine, not to make MongoDB into a relational database but to support analytic queries; it accompanies other features such as expanded federated query capability to extend Mongo’s reach.

In the short term, the parallels in both companies’ approaches are superficial: Snowflake is appealing to a very different kind of developer than MongoDB. But, in the long run, we expect both will run into each other. We’ve noted that, while databases will continue to differentiate, the market is demanding overlap. Snowflake addressed that from the get-go as one of the first cloud data warehouses (this was in the days before they repositioned to data cloud) to support JSON as a first-class citizen. And, despite its public pronouncements, MongoDB is quietly embracing SQL. The common thread is that both are seeking to leapfrog from department to enterprise and, as they do so, they must of necessity broaden their appeal to more constituencies.

Today, Databricks is Snowflake’s more obvious rival. Yes, there is competition from the hyperscalers, but Snowflake and Databricks alike are positioning themselves as analytics ecosystem destinations in a multicloud world. Both are headed toward the data lakehouse; Snowflake is approaching it from the constituency of the data analyst comfortable working with SQL, while Databricks has more traditionally appealed to the Java, Scala and Python developers that Snowflake is courting with Snowpark. This is a developing saga, and one that we’ll dwell on in an upcoming post.