If you've been following Payload over the past few months, you'll know that we're actively working toward adding additional database support. The team has been making excellent progress toward this goal, and most of the hard work is already complete.
We've recently streamlined the abstractions within our codebase to rely on an "adapter-style" database architecture, wherein additional database adapters can be built and supported over time both by Payload and our community.
The database adapter pattern itself is now done and functional using our existing MongoDB implementation.
In addition, we've completed work on establishing a fully-featured migrations workflow, and have also implemented database transactions natively within Payload itself.
Even if you're on MongoDB and intend to stay there, you're about to get a lot of new database goodies. MongoDB is certainly not going anywhere. Our support there is only getting better as a result of the work we're putting in here.
Now that the framework is laid and our intended abstractions are identified, we're working on actually mapping Payload fields and functionality over to a relational database model.
Over the past few months, we've looked at a ton of different ORMs to use as we build out our first relational database adapter. But we've been most impressed with Drizzle, and today we're sharing a proof-of-concept that showcases how we plan to map our fields to tables within a given relational database. We've written out a "reference" Payload config, and then translated the Payload config itself over to its Drizzle schema equivalent.
We've started with SQLite so as to plan for the "lowest common denominator" and to get up and running quickly, as well as gather feedback as succinctly as possible. Once we have the low-level pieces in place, we plan to move to Postgres.
Here's a quick list of the concerns we tried to keep top-of-mind as we identified which ORM we want to start with.
While working through these database initiatives, we know that we want to remain as performant as possible. Right now we're significantly faster than other headless CMS and we want it to stay that way. There are lots of factors involved in retaining Payload's speed in the relational world, and so far what we've seen from Drizzle here has made us feel confident. We love Drizzle's ability to make a single query to the database and retrieve all of the required tables that we need in one shot. And we think it's going to work very well with Payload. Due to the highly dynamic nature of the data Payload stores, performance is far and away the most important factor for us as we choose an ORM.
We want the Payload config to remain as the single source-of-truth for your schema design. We don't want you to have to maintain a Payload config, and then also deal with an auto-generated Prisma schema file or similar. Other headless CMS require you to design a schema within the CMS itself, but they turn around and dynamically generate a schema configuration for their used ORM which you also have to store in your repo, but NEVER touch. This is duplicative, messy, and error-prone.
We think that how we've established our approach with Mongoose so far has worked well here. We dynamically map an incoming Payload config to Mongoose models, and then leverage Mongoose models within our code. There is no reliance on the file system. Drizzle supports this quite well.
Our longer-term goal is to be able to run fully on the edge, and traditional database connections are less than ideal for that goal. If we can build an ORM that supports this goal, I'm happy. And I think you will be too.
We don't want you to have to learn the underlying ORM as well as learn Payload. We now give you a first-party migration architecture, and the migration files themselves are all in TypeScript. We want to make sure that whatever ORM we choose works with Payload logic, not Payload working with ORM logic. Drizzle appears to shine here as well, and the team is actually working on exporting a few functions for us to use which will make our integration seamless. Note - the PoC does not showcase Payload migrations (it only leverages straight-up Drizzle migrations). Payload + Drizzle migrations working in tandem will come soon.
Moreso than the identification of which ORM we plan to adopt first, the biggest goal of our proof-of-concept was to determine how to map our dynamic and nested data structures to a relational database structure.
To accomplish this goal, our PoC is barebones and is a completely separate repo from Payload itself. It simply showcases the table structure that we're working toward. We have attempted to identify how to map fields, and then how to query the data efficiently.
Right now, in MongoDB, group / named tab fields are super trivial to store:
We don't want to jump straight to a JSON column if we can avoid it. Not only that, but we'd also like to support SQLite, which is lacking in this area. So what we plan to do instead is just create "path-like" column names out of nested fields, using an underscore to join field names. From there, we will "transform" the data on its way out of the database back into its intended shape. The above data structure will end up looking like this:
my_group_my_nested_field | my_group_another |
---|---|
hello | 123 |
Arrays are fairly straightforward to store in a relational DB world. They're just a separate table which will maintain its own set of columns and locales.
In the PoC, we have demonstrated a posts
collection, which has a myArray
field. Payload will store that field's array data into a separate table called posts_my_array
. The table name is created from the collection slug
and the field path itself, separated by underscores. Each unique array field on a given collection will get its own table, based on the name / path of the array field.
Blocks are a bit more complicated than arrays, though, because every block has different fields (columns). So what we've done here is created a table for each distinct block
on a given collection type.
Take for example the following Payload collection config:
Two tables will be scaffolded for the above Payload config - posts_block1
and posts_block2
.
Notice how we are creating table names based on the block slug
- NOT the field name, like in the case of arrays. No matter how many times these unique blocks are used across a given collection config, the block data will go into the main block table.
We are currently thinking to key the block tables based on block slug, instead of field path as seen in the array field above because this will dramatically cut down on the amount of tables that you'd need, as well as the joins that need to be run.
This has some important ramifications, though. Whereas in MongoDB, you could have many blocks, with different schemas, with the same slug
, in the relational world, you'll need to make sure your blocks have different slugs if they have different field schemas. This will probably affect 0.5% of developers building with Payload, though, and we have decided that the table simplicity outweighs the value of having different block schemas with the same block slug.
Here's what the posts_block1
table will look like:
id | non_localized_text | _path | _locale | _order | _parent_id |
---|---|---|---|---|---|
1 | hello | myBlocks | en | 1 | 1 |
Notice a few extra columns? We will use the _path
column to store the Payload field path that the blocks correspond to, and the _order
column to store the order of the blocks so that we can maintain the integrity of the order that the blocks are returned in. Finally, the _locale
field will store the locale of the row in cases where the myBlocks
field has localized: true
.
Being that Payload stores locales on a field-level, we need a way to store many different locales for a single field. We've opted for a structure where every table of a given collection (the main table, each array table, each block table) will have a corresponding _locales
table to store their localized field data.
Taking the most simple example, the posts
collection will have a posts_locales
table that will be joined in and merged.
So let's take the following Payload config:
But in relational DB land this is a bit more complex. This config would produce the following tables:
posts
- this is the main table itself, with columns being id
, created_at
, updated_at
, and non_localized_field
posts_locales
- this table would contain id
, _locale
, localized_field
, and _parent
posts_my_array
- this is to store the myArray
rows themselves, with columns id
, _order
, and _parent_id
posts_my_array_locales
- finally, we store the locales for the array rows themselves as columns id
, _locale
, _parent_id
, and finally the sub_localized_field
itselfThe nice part here is that there is parity between tables. If localization is enabled, every collection will have a corresponding locales table, and then every array / block table will have its own corresponding locales table as well.
We could opt for JSON columns here when we get to Postgres, but I think as of now we are in the camp of "model the data right". This is especially important when you get to more complex field types that fall outside of basic text / number fields - like localized arrays, localized relations, etc. You can nest relationships at any level within any number of array / blocks fields, and we should be using foreign keys / etc. even at those deep levels.
Also, it's great to have Drizzle literally run one query, and merge all of this in for us seamlessly.
Phew, we're almost there. Saved the best for last. And....the most complex.
Payload supports four different relationship "types":
relationTo: 'pages'
, hasMany: false
)relationTo: ['pages', 'posts']
, hasMany: false
)relationTo: 'posts'
, hasMany: true
)relationTo: ['pages', 'posts']
, hasMany: true
)Performance is key here. Relationships are extremely central to everything that Payload does, and have the largest impact against API response time.
To add, supporting polymorphic relationships in specific is more involved than than simple "one to many" or "many to many" typical relationships. The typical way to do this would be a table set up for unions, using two columns - one, the related id
, and the other, a relation_to
column.
But unions are not widely supported in modern ORMs (even Prisma does not yet support them), and if we used unions, writing SQL manually / repurposing your own data would become more complex. We also wouldn't be able to make a single query to fetch all the data we need with either Drizzle or Prisma.
One of our goals with Payload in specific is to keep our mental model in general as concise as possible. Complexity is the enemy of good.
All of this is to say that unions might be difficult / less than ideal. And that leaves two ways to model all four types of Payload relationships:
*_locale
table pattern above, where there would be a relationship join table for every table we have (main collection table, each locale table, each array table, each block table).*_relationships
table for each main collection, and store all relationships centrally in one tableIn our PoC, we landed on option 2. Less joins, more mental clarity, easier bi-directional relationships, etc. Here's what a potential relationship join table, called posts_relationships
, would look like for a posts
collection:
id | parent | path | order | posts_id | pages_id |
---|---|---|---|---|---|
1 | 2 | someField.nestedPath | 1 | 5 | NULL |
The above table has a foreign key for the parent
row as well as nullable columns (w/ foreign keys) for every other collection type that it could possibly have a relation to. This example shows a posts_id
and pages_id
column but there would be an id
column for every necessary collection in the Payload config.
It also has an order
column, and a path
column, which will be used similarly to the way they are in the blocks table discussion above. This table would be the source of truth for all posts
relationships. No relationship data would be stored on the posts
collection itself, for clarity.
One nice part of this is that, thanks to Drizzle, when we go to query posts, we can join in all relationships in a single query, succinctly and efficiently. And then our database adapter will simply "transform" the data into the shape Payload's APIs (and you) expect.
There are many alternative ways to approach modeling our relationship data. Here are a few examples.
posts_pages
, posts_media
, etc. - but then we'd need the inverse of those tables as well, so pages_posts
, media_posts
, etc. This would require many more tables, joins, and a bit more complexity, but no nullable columns in the posts_relationships
table.We're making a lot of progress, and we're pumped about it. We really admire our community and we've been consistently blown away by how smart you all are. We'd really appreciate your thoughts / feedback here, and together, we can knock this out of the park.
Go pull down the PoC and take a look:
https://github.com/payloadcms/drizzle-test/tree/master/sqlite
We'd love to know what you think! Join the GitHub discussion here: