SQL vs. NoSQL - cutting through the Tech Twitter noise

Published On
SQL vs NoSQL
SQL vs NoSQL
We've done a lot of thinking around database support lately, and even hosted a community discussion on Discord. Right now we're MongoDB-only, but our most upvoted roadmap item is adding additional database support and in particular, Postgres or SQLite. And transitioning from document based, NoSQL thinking to a relational DB mindset using SQL really spotlights the concrete differences.

In this post, I'm going to cover the big challenges that we'll face when adapting our features and functionalities over from MongoDB to a relational DB, and along the way, I'll highlight the strengths and weaknesses of each paradigm with some objective ramifications.

Data Definition Language

Since we're currently solely on MongoDB, we don't need to do things manually like adding a new table column when you add a new field to your Payload config. But with relational DBs, we need to think about that (and make it painless).

DDL, or Data Definition Language, is a subset of SQL that defines and maintains relational database structure. Even when you use something like Prisma, which takes care of a lot of the headache for you, you need to become intimately familiar with how Prisma migrations work. There, Prisma translates its own schema syntax into DDL migrations for you.

Don't know what any of this means? Go ahead and try to simply follow the "Create a column in a table" example. If you know SQL / DDL and if this is not new to you, then you're probably okay with this. But there's work involved here.

When you decide you want to store a new field in a NoSQL database like MongoDB, the change is entirely in your application layer and the database does largely as it is told. No migrations, no generated DDL, no schema / database "sync".

That's obviously a big win for NoSQL, but that also means that the NoSQL structure is significantly more loosey-goosey and that can lead to errors. Lots of devs (not to mention project requirements) straight up require the structural constraints of relational databases, and allowing any type of data to be stored in a NoSQL document can present issues in terms of data integrity. So there are definitely pros and cons to both approaches.

Migrations

Fascinatingly, another very hotly requested Payload feature is a first-party way of handling migrations. And that got us thinking. When you look at the Prisma migration resources above, clearly, they are using the word "migration" to describe the "migration" of your database shape from A to B. This is the expected and most widely used meaning of "migration".

But we're on MongoDB, and that doesn't apply. But developers are still asking for migration support. There is a specific distinction here that I'd like to make. Migrations can mean many things in many different contexts.

Changing database shape

Most often, they are referred to as simply modifying your database structure (DDL). This is a requirement for relational databases in general. For example, adding a column to a table is a manual `ALTER` that needs to be run at some point.

Transforming existing data

But there's a whole different side to the story. What about if you need to change or transform your existing data? What if you had a full_name field, and you needed to split it up into two separate fields, first_name and last_name? Stuff like this is super common, and requires manual, programmatic management. Whenever anyone has asked Payload for migration support, this is what they meant.

Due to this second aspect of "migrations", that means that the concept is relevant for all types of databases. Application frameworks like Laravel handle this very well, with a full, true "migrations" workflow that allows you to handle both DDL and data-based transformations within the same paradigm, all based on database transactions. I personally think this is by far the best way to solve this and this is an often-overlooked aspect of ORMs (Object Relationship Mapping) in general. Some ORMs come with great programmatic migration support, while others solely handle the DDL side (the pain in the ass side) in raw, generated SQL statements. Excuse my French, but I've been spoiled here by MongoDB, and I spoiled myself intentionally.

If you end up listening to our community planning call, you'll see that we cover our goals and future vision in-depth on that call. We want to model ourselves after Laravel's consistent (solve for both) approach by not outputting raw SQL, but rather, outputting TS files that give you granular, functional control.

Relational data and performance

One of the biggest wins for relational databases in the minds of engineers is its heavily structured relational architecture. Without a doubt, relationships are a first-class citizen here. Much of the querying language itself is dedicated specifically to combining and relating normalized data from separate tables.

But relationships can certainly be done in NoSQL, so what's the deal? What are the real-world differences?

First up, there is an entire array of functionality within SQL that actually enforces referential integrity. Within most NoSQL databases including MongoDB, those types of constraints would all need to be written manually, and you'll need to ultimately "own" the integrity of your data. In NoSQL, you might have a relationship to another document. You could go delete that document, but unless you've handled this yourself, the original relationship would remain, yet be completely invalid. That's not ideal, and that places more work on you.

Let's take Payload, for example. Surprise, surprise. We have a relationship field, and it can store IDs to other related documents which are seamlessly merged in when you retrieve documents from the DB. We leverage the dataloader pattern to batch together all "populations" required for a given query, returning them all super fast and with as few separate queries to the DB as possible.

We actually even outperform relational DB-based frameworks quite a bit. In a purely relational test, we were 3x faster than Directus and 7x faster than Strapi while both were running Postgres, and we were on MongoDB.

Without knowing what's under the hood of either of those platforms, my takeaway here is that relationships are obviously possible in both database paradigms. If you were to write out queries deliberately for very specific joins between tables, and have the ability to define queries so that a single query is made vs. dynamic populations, then a relational DB will likely outshine MongoDB in terms of relational performance. But - your application logic will determine which is faster, and it's easy to get it wrong with either side.

One thing to note here is that if you are using a GraphQL API, or any type of dynamic / expressive querying language, it's significantly harder to make a single query that returns all populated "relations". This is because it's up to the shape of the query to determine what to "populate". This is known as the N+1 problem. You first need to load the main document, and only then can you determine what to load after. If this applies to you, then most relational DB performance gains are significantly less-applicable, which is demonstrated in our performance test linked above.

Handling complex data structures

One absolutely massive win that NoSQL (MongoDB in specific) has over SQL is that you can store complex data directly in a single document. It can be stored in the same shape as you want it to be in when you read it back from the database. Some relational databases like Postgres have pretty great JSON support via JSON columns, but if you over-use JSON columns in a relational database, you're not taking advantage of the strengths of relational DBs in the first place and might as well go with NoSQL.

So what's it look like to store a complex data structure in a typical relational DB, without losing the benefits of the paradigm in the first place? For a simple example, take Payload's Group field:

1
{
2
"id": "some-id-here",
3
"myGroup": {
4
"mySubField": "hello",
5
"anotherSubField": "goodbye"
6
}
7
}

The above field can be stored directly in NoSQL as-is, which is fantastic. But what about relational databases? Do we transform the nested architecture to a flattened column structure, using a double underscore to separate field "paths"?

idmyGroup__mySubFieldmyGroup__anotherSubField
some-id-herehellogoodbye

This could functionally perform the same, but we'd need to "transform" the data on its way out from the database somehow back to its intended shape. Doable, but not nearly as elegant.

Alternatively, we could "join in" another table that includes columns that represent properties within the group. That'll become a bit less performant, because we're now doing a join that we didn't need to do in NoSQL land.

Finally, do we just say "screw it" and embed the group and all of its properties, from that point on, into a single JSON column?

Field-based localization

One specific feature that we support, which is simple in MongoDB, but more difficult in relational DBs is field-based localization. You might have 30 locales, and any given text field needs to be translated into each of the 30 locales.

1
{
2
"id": "some-id-here",
3
"myLocalizedField": {
4
"en": "hello",
5
"es": "hola",
6
"de": "hallo"
7
}
8
}

Again, in MongoDB, this is a cake-walk. Storing the locales all on a parent document is great, because they are fundamentally a part of that parent document.

But in a typical relational DB we've gotta get crafty again. You might have 30 locales or more. Does that mean 30 columns for every single text field, all in one table?

idmyLocalizedField__enmyLocalizedField__esmyLocalizedField__de
some-id-herehelloholahallo

If you have too many columns in most relational DBs, you'll start to see performance implications. So this direction is out right away. Alternatively, you could create a separate table that stores all of your locale data, and then write a join to merge in all locales into the parent document, and then build a "transformer" plugin to then recreate the localized structure that Payload APIs expect but as you can imagine, this is a lot more complicated and having NoSQL flexibility really starts to shine here.

Another way that we could solve the locale dilemma is to simply jump to using JSON columns for any localized field. We will probably do this for databases that have strong JSON column support like Postgres, but this will be more difficult for databases with lesser JSON column support like SQLite. In either case, we're losing a lot and departing from what makes relational DBs strong in the first place.

Array-based structures

In NoSQL, storing arrays is dead simple. Just save it on the doc and move on.

1
{
2
"id": "some-id-here",
3
"myArray": [
4
{
5
"subField": "hello"
6
},
7
{
8
"subField": "goodbye"
9
}
10
]
11
}

The well-accepted and correct way to handle array-based data in relational DBs is to simply create a new table for each array-based field (like Payload's Arrays and Blocks fields). Then you'd just join in each row of array data when you retrieve the record back. It's more complicated, sure, and we're losing some performance here because we have to make joins to retrieve the data and update many rows when a user changes the order. Even though joins are ubiquitous in relational land, there's still more work involved vs. simply retrieving a single document from the database. But it's logical and at least the answer is clearer than the last few examples.

Takeaways

Ultimately I think having a healthy dose of skepticism when reading anything that Tech Twitter or Reddit tells you will do you a great deal of good as an engineer. The truth is that there are certain cases where relational DBs will perform better for you, and NoSQL will be a better fit elsewhere. But it's up to your own unique requirements.

Overall, I think it's safe to say that a relational approach will win for your project if you:

  • Know the shape of your data well in advance
  • Are confident that your schema won't change often or are comfortable dealing with lots of DDL migrations
  • Have a relatively "flat" and not complex schema
  • Require heavily relational structures

On the flip-side, NoSQL will win out if you:

  • Frequently manage complex, nested data structures like Groups, Arrays, Blocks, localized field data, and similar
  • Use relationships solely for simple structures like Categories, Tags, Authors, etc.
  • Want to avoid dealing with DDL stuff entirely and make your life easier
  • Are okay with manually enforcing data integrity and handling edge cases yourself, or are using a framework that handles that stuff for you on top of NoSQL

Ah, life. Everything's a compromise. As much as I'd like to go hard and declare a victor, I totally understand why Payload relational DB support has been asked for, although I personally think MongoDB is still a better fit for most of our users. In the end, you'll be able to pick whatever's right for you.

Planning session recording

We recorded the whole Discord planning session that took place last week. To hear more about how we're approaching future database support in Payload, including ORM evaluation and more, give it a listen.

Where can you help?

First up, we want to hear from you! If you haven't yet, make your voice and vote known on our Roadmap GitHub Discussion.

From there, once we have our future database adapter pattern established, anyone can build an adapter. We'd love to see some community-generated database and ORM support and that will only speed up / pressure test what we're building.

We're really appreciative of the engagement we saw on that call and are super pumped about doing more of those in the future. The next one will likely be regarding replacing Webpack, but we'll see. Follow us on Twitter to keep in the loop or join our Discord to keep an eye on the conversation there.