What is a NoSQL database?

The NoSQL database gets its name from what it isn't: It's a database that does not use Structured Query Language (SQL) to access the data. Some of the well-known databases, such as Oracle and PostgreSQL, are SQL databases, but most new databases that have been introduced over the past few years are considered NoSQL databases. Some people insist that NoSQL is not the exact opposite of SQL and argue that the name really stands for "Not Only SQL." Either way, what's important is that NoSQL databases relaxed many of the requirements that defined earlier SQL databases.

While some NoSQL databases support SQL queries, most of them are built around engines that offer better performance and more flexibility for certain use cases.

Differences between NoSQL and SQL

The most prominent difference between NoSQL and SQL lies in how the data is structured in the database. SQL databases organize the information into rectangular tables with columns that are predefined and populated with set datatypes such as integers and dates. NoSQL databases, on the other hand, store pairs of data: a key holding the name of the field and the value connected to that field. This flexibility allows some entries to have a few keys and other entries to have completely different sets of keys. For example, one entry may have the keys "name," "rank," and "serial number," while another could store just the "name," and a third might hold "name," "age," "home town," and "height."

This flexibility is a blessing and a curse for developers. Adding special fields for particular rows is simple. The type of data being stored can evolve over time, and the database can adapt to the changes because it can handle new entries carrying a set of data that differs from older entries. But that freedom can wreak havoc if developers lose track of what data may or may not be stored. The code can't rely on any predefined structure to simplify the processing, and data must often be checked and double-checked after being retrieved. The raw storage for the database can often be larger because each entry keeps a set of keys to unpack it, something that can be quite wasteful if multiple entries have the same keys.

Some common use cases tend to be open-ended systems that will evolve over time. A customer management system, for instance, may start out tracking bare essentials like name and phone number. Over time, the sales team may want to store more useful information about customer preferences, like their favorite products or their particular business strategy. A NoSQL database makes it simpler to add new fields for the entries that need them.

Some NoSQL databases use a "document" model, where sets of keys and their values are grouped into documents. Sometimes the values can hold other documents, allowing elaborate nested hierarchies of documents. Some simpler NoSQL databases don't allow this, and sometimes they're just described as "key-value" stores.

How important is data consistency?

Transaction support is another major difference between the two types of databases. Many early NoSQL databases did not use the most sophisticated algorithms for ensuring consistency between entries and tables. The earliest NoSQL databases used simpler algorithms because they were focused on speed, making them attractive to developers who were more concerned about database performance and less about achieving perfect consistency. Traditional SQL databases make better promises about preventing mistakes, which is an important feature in case of power outages, transaction errors, or hardware failures.

A social media company, for instance, may not worry if some posts don't post correctly. But a bank would be very concerned if there were inconsistencies in account balances because a deposit transaction failed.

Over the years, the distinctions between the two databases have narrowed, as some NoSQL databases have adopted better algorithms to match the accuracy provided by earlier SQL databases.

In general, developers tend to prefer traditional SQL relational databases for applications with well-defined data structures that must be carefully enforced. Financial records and scientific results, for instance, benefit from rules on data types and formatting.

A less obvious, but still significant, difference between SQL and NoSQL databases is the format the databases use for their responses. While SQL databases used a spare format to return answers to queries, some NoSQL databases formatted their responses in JSON. Developers like JSON because it makes it easier to write code for the browser. Several SQL databases have also adopted JSON to take advantage of this convenience.

Legacy databases adopt NoSQL features

The big database companies have adopted several NoSQL database features. As mentioned earlier, newer versions of SQL databases adopted JSON for the response format. PostgreSQL, Oracle, IBM, and most others have added extra responses that preformat the data in JSON to make it simpler for developers to switch between NoSQL and SQL databases.

Microsoft's CosmosDB is said to be a "multimodel" database because it offers two APIs, one that speaks traditional SQL and another that speaks NoSQL. The data underneath is stored in a NoSQL format that is a superset of the tabular model, and the API interprets SQL requests when necessary.

Oracle offers its own NoSQL database as both a product and a service, and it smoothly scales to distribute data over multiple nodes.

NoSQL upstarts in the market

While most of the NoSQL databases are relatively new, at least compared to SQL databases, many are well-established in the enterprise. MongoDB, for instance, is a publicly traded company offering a number of different versions of its core database, both as a service and on-premise. The open source edition is frequently installed as a core component of web applications.

Couchbase is another independent company that began more than a decade ago but hasn't gone public. Its core NoSQL engine has expanded over the years, and the company now offers other services, like full-text, mobile support, and an SQL-like API for more complex queries.

Cassandra began as a project inside Facebook to support the social media giant's vast collection of data. Social media sites are good examples of applications that work well with the unstructured freedom of document-style databases. The tool is now released as open source, and companies like Datastax have sprouted up to support cloud and on-premise installations.

The cloud companies offer a variety of tools that vary from their own proprietary versions to curated versions of open source tools. Google, for instance, started building Bigtable for its internal use and later started reselling it as a service on the company's own cloud platform. Another product, Firebase, is designed to integrate a document-style API with communication software to make it simpler for data to be synchronized between mobile devices and the centralized cloud.

Amazon offers two options. DynamoDB is optimized to support large, enterprise-scale collections of data that need a fast response. Data is encrypted for security as a default and ACID-level transactions are supported. A second option called DocumentDB is built to be compatible with MongoDB.

Some of the popular NoSQL databases are tightly coupled with support for distributed analysis. HBase and Accumulo are two options that are integrated with the Hadoop world for big data processing.

Many of the other types of databases share some structural similarities with NoSQL. Graph databases like Neo4J and ArrangoDB are mainly designed to store networks or interconnected nodes, but they often also use NoSQL's simple model for the data stored at these nodes.

A number of databases are following the NoSQL tradition of relaxing some of the structural rules that defined the SQL generation while retaining elements of SQL. EraDB's tool for searching time-series log data, for instance, is said to be "schema-free" because there are no predefined rules for the structure of the data. The company's query language is SQL, however, and so it straddles both camps.

Is there anything NoSQL can't do?

The document or key-value model is a pure superset of a tabular model, and so every set of rows and columns can be easily stored as pairs of keys and associated values. Still, this flexibility comes with a cost in time and sometimes efficiency. Each entry must track the keys and also be ready to search them for the matches. This can be very repetitive and consume more disk space in cases where most or all of the entries have the same fields with the same names. Relational databases can also split data into multiple tables, a process that can dramatically reduce the number of repeated values.

Some NoSQL databases still don't offer the best algorithms for ensuring consistency. These are poor choices for applications that require the best levels of accuracy like, say, banks or reservation applications that can sell only one seat on a flight. The early versions traded off this security for speed and attracted applications that didn't need absolute consistency. Many of the newest versions of the NoSQL use better algorithms now, making this difference less pronounced.

This article is part of a series on enterprise database technology trends.