The Evolution of Database Schema Management
In today's world, where software is developed and deployed at a rapid pace, selecting the right tools for your team
is crucial. The right tools can help you move faster, be more productive, and reduce the risk of errors and bugs.
Most backend applications are backed by a database, and maintaining the database schema is a critical part of the
development process.
Our industry has seen many trends in the way database schemas evolve over the years. In the early days, DBAs would
manually run SQL DDL commands directly against the database through a terminal. Then came GUI based, point-and-click
database management tools that included schema editing capabilities.
In the early 2000s, we saw the rise of "database migration tools" that allowed developers to write versioned
database upgrade scripts (usually in SQL), called "migrations", that would be executed by the tool itself.
This trend was further intensified with the rise of DevOps and CI/CD, as it became the de-facto standard way for
automating database schema management.
The Rise of Database Schema-as-Code
In recent years, we have witnessed a seismic shift in the way cloud infrastructure is managed. The rise of
tools such as Terraform and Kubernetes has led to the concept of "Infrastructure-as-Code". As the complexity of
infrastructure grew, it became clear that the only way to manage it effectively was to use a declarative approach
and treat it "as code".
Infrastructure-as-code, in this context, means that the desired state of the infrastructure is defined in a formal language ("as code")
and stored in a version control system (such as Git). The infrastructure is then managed by a tool that compares
the desired state to the actual state and makes the necessary changes to bring the actual state to the desired state.
This approach has many advantages over the traditional imperative approach of manually running commands against the
infrastructure. It allows for better collaboration between team members, it is more robust and less susceptible to errors,
it is self documenting, and allows for better visibility, governance and control into the state of the infrastructure.
Following this trend, some projects such as Atlas and Skeema have started
to apply the same principles to database schema management. Instead of using an imperative approach, where developers
do the planning and tools are only used to execute the plan and keep track of what has been done, they bring to the
table a simpler approach. This approach, which is gaining the name "Database Schema-as-Code", applies a declarative
approach to database schema management: developers provide the desired state of the database and the tool automatically
plans the required migrations.
A practical example
Let's use Atlas to show a quick example of how database schema-as-code tools work.
Atlas allows users to represent their database schema
in plain SQL or in HCL. The schema is saved in a schema file, which holds the desired state of our
database.
We will start with a simple schema that represents a users
table, in which each user has an ID and a name:
schema.hcl
table "users" {
schema = schema.example
column "id" {
null = false
type = int
}
column "name" {
null = true
type = varchar(100)
}
primary_key {
columns = [column.id]
}
}
Now, let's apply this schema to an empty MySQL database:
atlas schema apply \
-u "mysql://root:pass@localhost:3306/example" \
--to file://schema.hcl \
--dev-url "docker://mysql/8/example"
schema.sql
CREATE TABLE `users` (
`id` int NOT NULL,
`name` varchar(100) NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
Now, let's apply this schema to an empty MySQL database:
atlas schema apply \
-u "mysql://root:pass@localhost:3306/example" \
--to file://schema.sql \
--dev-url "docker://mysql/8/example"
Atlas will compare the desired state from our schema file to the database URL provided,
and plan the migration:
-- Planned Changes:
-- Create "users" table
CREATE TABLE `users` (
`id` int NOT NULL,
`name` varchar(100) NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
? Are you sure?:
▸ Apply
Lint and edit
Abort
After applying this to our database, we will run the command once more and see that Atlas recognizes there is no drift
between the two states:
Schema is synced, no changes to be made
Let's make a simple change to our schema by adding an email
column:
schema.hcl
table "users" {
schema = schema.example
column "id" {
null = false
type = int
}
column "name" {
null = true
type = varchar(100)
}
column "email" {
null = true
type = varchar(255)
}
primary_key {
columns = [column.id]
}
}
schema.sql
CREATE TABLE `users` (
`id` int NOT NULL,
`name` varchar(100) NULL,
`email` varchar(255) NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
Now let's run the schema apply
command again, and view the migration plan that Atlas will create for us:
-- Planned Changes:
-- Modify "users" table
ALTER TABLE `users` ADD COLUMN `email` varchar(255) NULL;
? Are you sure?:
▸ Apply
Lint and edit
Abort
To conclude, in this example, we got a glimpse of how a tool like Atlas works with a database schema using the
declarative approach.
If you are using an older more traditional database migration tool or not using one at all, you might be wondering
if you should consider switching to a Database Schema-as-Code tool. Here are five signs that you might need
to consider it:
1. Your team is making changes to the database schema directly in production
The worst kind of tool for managing database schemas is no tool at all. Running migrations manually against production
databases should be avoided at all costs. First, humans are notoriously prone to errors, and the risk of making a mistake
that will cause downtime or data loss is very high. Second, any changes made directly in production are not tracked
anywhere, so it is very hard to know what has been done and by whom. Third, it means that your deployments cannot be
fully automated and you are missing out on the benefits of CI/CD.
Using a Database Schema-as-Code tool as the exclusive way of making changes to the database schema is a great way to
avoid these problems. It allows you to automate your deployments, track all changes to the database schema in source
control, and collaborate more effectively with your team.
2. You don't have a single source of truth for your database schema
Having a single source of truth for how your application is deployed is a key principle of DevOps. It allows you to
have a clear picture of the state of your application at any given time, and it allows you to automate your deployments
and avoid human errors.
By requiring all migrations to be checked in to source control, traditional migration tools can go a long way to
ensure that the database schema is in a consistent state, but they suffer from two issues in this regard:
-
They only keep track of the changes to the database ("migrations"). To figure out the actual schema of the database
at any given version, you need to apply all the migrations up to that version to an empty database and inspect
the result with an additional tool. Database Schema-as-Code tools, on the other hand, naturally document the
schema of the database in any given version.
-
They have no way to ensure that the actual database's schema is consistent with the migrations. If someone makes
a change directly in the database, there is no way to know about it. When you apply your schema to a target database
using a Database Schema-as-Code tool, it will compare the desired state to the actual state and make the necessary
changes to bring the actual state to the desired state.
This ensures that over time your actual databases will not drift from the desired state.
3. You don't have any database experts on your team
In the past, managing database schemas was under the responsibility of DBAs. They were the ones who were responsible
for making sure that the database schema is consistent and that all changes are done safely.
In recent years, as cloud-native, microservices-based architectures have become more popular, the amount of databases
that each team manages has grown significantly. This has led to a situation where many teams don't have a DBA
, and the responsibility for managing the database schema falls on the shoulders of the developers.
If you maintain a small schema with a few tables, you might not feel the need for a tool to help you manage it.
But as schemas grow in size and complexity, to contain thousands of database objects (tables, views, functions,
stored procedures, etc.), it becomes very hard to manage them manually. A Database Schema-as-Code tool can help
you manage your schema more effectively, even if you don't have any DBAs on your team.
4. You manually verify the safety of your database migrations
If your application is small and the stakes for making a mistake are low, you might be able to get
away with messing up a schema change now and then. But if your workload is mission-critical, if
application downtime or data loss can have a significant impact on your business, you need to make sure
that your migrations are safe.
Many teams have been through the unpleasant experience of a failed migration that caused a significant
outage. This is usually followed by a change in review processes where migrations are reviewed by
multiple team members, and a manual verification process is added to make sure that the migration
is safe to run.
This process is time-consuming and error-prone. Even worse, it turns team members with more database
experience into bottlenecks, slowing down the development process and team velocity.
Database Schema-as-Code tools can help you automate this process and make it more robust. Atlas,
for example, comes with built-in support for Migration Linting, Diff Policies,
and other safety mechanisms that help you catch errors and issues before they lead to a full-blown production outage.
5. Your app and infra delivery are way better than your database deployments
When interviewing platform engineers about how their deployment infrastructure works, we keep hearing the same story:
"Our deployments are 100% pure Infrastructure-as-Code GitOps goodness, we can do them 50 times a day!" And when
we ask about databases… "Oh yeah, there’s that part... It’s not as great."
A lot has changed in the way applications are deployed since the early days of database migrations. The versioned migrations
paradigm was a great improvement over the previous manual approach, but it is still very much an imperative approach.
Modern application delivery principles such as GitOps and
Infrastructure-as-Code
work in a declarative way, and so they need a database schema management tool that works in a similar way.
If your application and infra delivery feel like a well-oiled machine, but your database deployments are still
semi-manual, or feel clunky and error-prone, you should consider switching to a Database Schema-as-Code tool where
you can manage your database schema in a Kubernetes Operator
or a Terraform Provider.
Enter: Atlas
Atlas is a Database Schema-as-Code tool that allows you to manage your database schema
in a declarative way. With Atlas, developers provide the desired state of the database schema and Atlas
automatically plans the required migrations. To get started with Atlas, you can follow our quickstart guide..
Wrapping up
In this article, we have discussed the evolution of database schema management tools and the rise of
Database Schema-as-Code tools. We have also discussed five signs that you might need to consider switching
to a Database Schema-as-Code tool. I hope you found this article useful.
As always, we would love to hear your feedback and suggestions on the Atlas Discord server.