What is GraphQL?

Front-end developers looking for a way to work with back-end databases on the server have been embracing GraphQL for its powerful but simple format for expressing complicated requests. While GraphQL is often considered to be closely related to graph databases used to store relationships like social networks, it's outgrown that connection. Its stripped-down syntax makes it a popular way to fetch all data from tables.

Consider the example below: One page needs just a list of users. Another must include only the users with a pet dog. A third wants the users matched with their addresses and sorted by ZIP code, but only for U.S. addresses.

When the queries must work with data structures that are multi-layered and sometimes interconnected, GraphQL simplifies how we ask a question to produce just the right answers.

SQL and the move to GraphQL

Traditional databases are based on a language called SQL, short for Structured Query Language. Simple requests are pretty easy to write in SQL, but problems arise when tabular data has too much structure. Navigating complex requests is hard, and the data for your application is filled with nested fields buried in sections and subsections. Trying to find a good query to retrieve exactly the right subset that matches requires deep thought, experimentation, and iteration. Developers spend time thinking through complex chains of operations that match and merge tables -- called JOINS -- instead of working on the application.

GraphQL is a simplified mechanism for presenting queries to the database. It's built with a modern syntax, so it will be familiar to developers who work with web browsers and the server stacks like Node.js that use JavaScript. The request lists all of the desired fields and adds limits to particular fields for matching or searching. The format for the list is sparse, echoing the popular JSON data structure format with fields framed by curly brackets.

Many database administrators are embracing GraphQL because it saves developers time and can encourage them to write more precise queries that return only the essential data, thereby saving bandwidth.

GraphQL was originally created by Facebook for an internal project, and the company started sharing it in the hope of building a common standard. It's succeeded, and many users now rely on it for requests that bear little resemblance to the initial tasks.

The earliest version dates to 2012, but Facebook didn't release language descriptions and specifications until 2015. Facebook spun off a separate foundation in 2018 when other companies started using the query language. External clients with API access to Facebook's datacenters must use GraphQL to search.

The query language has also become tightly aligned with what is sometimes called the Jamstack style of developing Node.js applications. Some of the major libraries, like Gatsby, use GraphQL as their lingua franca for extracting information from the database. Programmers who take up this development style naturally choose the language.

How does it help?

GraphQL users often speak highly about its simplicity and how they're able to craft complex queries that can traverse complex data structures that were built with many connections. If the data is simple enough to fit in one table, there's often not much to gain from switching to GraphQL, but if the data includes multiple tables, then it can shine.

In an example from the air travel world, a schedule may be filled with flights, and each flight is filled with passengers. Each passenger also has their own physical characteristics, like height or weight, as well as preferences and medical needs. Finding a list of all flights with passengers who need an extra-large wheelchair takes only a few words with GraphQL.

The query language welcomes programming structures like variables. The results feel more like JavaScript code, which makes adoption a bit easier.

The language is also sometimes confused with graph databases, a type of tool designed to simplify storing and searching networks of linked elements. GraphQL works quite well at specifying the queries for these complex graphs, especially when the results must be found by searching through several layers of nodes or data elements. But the language also works well with traditional tabular or document-centric databases. In other words, there doesn't need to be a graph to use GraphQL.

Some traditional developers find that GraphQL is a bit too good at hiding the complexity of the retrieval process. Some database analysts like crafting SQL queries with clear JOIN statements because it forces them to imagine how the different tables will be connected. The JOINs can be the most time-consuming part of answering queries, and writing them out explicitly forces the analyst to consider the trade-offs in time versus space. Understanding the structure of the queries also allows the database creators to plan ahead to speed some of the queries by adding indices.

How the legacy players approach GraphQL

The traditional databases speak SQL, but they are embracing GraphQL by adding extra software layers that convert the GraphQL into traditional SQL statements. The results in JSON are usually formatted by the native JSON routines that have been part of the major databases for some time.

One startup, Hasura, distributes an open source package that will translate GraphQL for PostgreSQL databases. It is tightly integrated and works with many PostgreSQL-specific features like support for GIS and geographic coding. The company also runs a fully managed cloud API for those who want to pay for a service instead of a software product.

Hasura is also working on integrating its interface with Microsoft's SQL Server. This joint project between Hasura and Microsoft is expected in early 2021. [Update: On 2/23/21, Haura released GraphQL Engine 2.0, which allows connection to multiple databases simultaneously, database generalization, and support for REST for the user facing API.]

Other tools are also providing the same kind of glue. JoinMonster, for instance, works with all of the major SQL databases from Oracle to SQLite, as well as several versions of MySQL. It integrates with Node.js and uses the database schemas to plan a set of traditional SQL queries that will fetch the right amount of data.

Oracle is also expanding access by adding exterior GraphQL parsers that produce PL/SQL that will find the data. The Oracle database already includes a standard feature for formatting the answers in JSON.

The upstarts

DGraph calls itself the "only native GraphQL database with a graph backend." It's an open source tool designed to scale horizontally while delivering full transaction control to prevent inconsistencies. The code is released under a mixture of the Apache license and the DGraph community license.

FaunaDB, a NoSQL database, originally used its own relational-style query language called FQL. Its developers also added a GraphQL API for those who are following that standard.

Apollo also offers a GraphQL server that can both store data locally and also work with other services to gather all of the data for the answer. It can act as a gateway to a constellation of federated servers, thus simplifying the interface for front-end developers. The tool's architecture encourages what its developers call a "separation of concerns" through which the work is split into separate services and data stores. This encourages a more resilient data source, because a failure in one may not affect the other.

Amazon Web Services (AWS) is also embracing GraphQL for a number of APIs. The Amplify framework, for instance, relies heavily on the language to handle the job of fetching data. Other applications can also add a GraphQL API to an AWS data source, like DynamoDB using the AppSync service.

Is there anything GraphQL can't do?

If the database schema is simple, and the query will pull all of the information from one table, there's not much difference in the complexity of an SQL or GraphQL query. The traditional response format from SQL databases is also sometimes more efficient than JSON. In these cases, there's no obvious benefit to choosing one over the other. Modern developers may prefer the syntax of GraphQL, but it is largely a matter of taste.

GraphQL shines when the schema includes many normalized tables and the queries span several of them. Choosing the fields and including filtering values is simpler to understand because the developer doesn't need to consider the JOINs.

The full GraphQL specification also includes a number of options for more programmatic approaches to queries like adding variables and defining functions. These can build even more elaborate strategies for retrieving just the right data and reducing the size of the response. The complex features, though, can be overwhelming for some developers and require elaborate debugging and experimentation. Some queries that search complex networks can be daunting.

This article is part of a series on enterprise database technology trends.