Modern Database Desiderata

Aug. 20, 2022

I've been working much closer to database systems in the past year or so. This has led to learning a lot about the things modern applications desire from their database systems. This insight was much less obvious to me when my circle of concern was limited to the application rather than the platform.

I am still talking about relational databases here, as I think they remain the best choice for an overwhelming majority of use-cases, and these features have to augment, rather than supersede the ACID properties that these databases already provide.

Of course, none of these ideas are novel in any way, and there are databases that have more than one of these features already. However, I have yet to see them all fully integrated in a single system, and I think if a single database can provide all of these things built-in (rather than through, say, a postgres extension), it will be well placed to become the default choice for non-trivial applications, as it would allow engineering time to be spent elsewhere.

Okay, enough talk. Let's look at the features. In no particular order:

Online Schema Changes

Online schema changes are not news in the database world. Tools like pt-osc and gh-ost have existed for decades, and are battle-hardened. However, running them in production is still an extra step to take and another thing to maintain. InnoDB has had online DDL for a while, but it has some shortcomings that prevent its full adoption. This is not specific to MySQL, and I think more database systems should invest into supporting online schema changes as a first-class citizen.

Bitemporality

Martin Fowler has a more complete treatment of the topic, but in short, it is sometimes useful to track system-level history rather than just business-level history. For example, consider that a shop imported their order history onto a new system. The history has to preserve the original modification dates for the orders for compliance purposes, but various other systems in the new platform might care about rows that were changed during the time the orders were imported (rather than originally modified). This is where bitemporality can be a big boon.

I am convinced that any $billion company has to build systems to work around this limitation, especially where you need to sync data outside of your main database. Some databases already have this functionality (e.g. MariaDB and SQL Server).

Schema History

Similar to the system-versioning tables, versioning schemas also unlock use-cases where data needs to be synced to external systems, by letting them keep their schemas in sync too. This is almost necessary for real-time use-cases, and different products out there have to maintain their own implementation for this feature (e.g. Debezium).

Auto-updating Materialized Views

If the number of startups solving a problem in any indication, then this is probably the biggest missing piece of the database systems puzzle. While some databases support materialized views, others don't have any concept of it entirely.

Once you get to a certain size, you simply need the ability to have different views into your data without compromising on the freshness and correctness. Providing a way to build materialized views is the first step towards it. Keeping them up to date is the next. This is of course a hard problem to solve, but even a schedule-based mechanism is probably going to get users very far.


That is my list (as of now). Maybe I'll think up more things, but I doubt I will need to, given the pace of change in database technologies. So see you in, uh, 20 years?


Share this post on :
Posted under code db