Testing Data Migrations
Most commonly, migrations deal with schema changes, such as adding or removing columns, creating tables, or altering constraints. However, as your application evolves, you may need to add or refactor data within the database, which is where data migrations come in. For instance, you may need to seed data in a table, backfill data for existing records in new columns, or somehow transform existing data to accommodate changes in your application.
Data migrations can be especially tricky to get right, and mistakes can be problematic and irreversible. For this reason testing data migrations is crucial. Testing data migrations typically involves the following steps:
- Setting up an empty database.
- Applying migrations up to the one before the test.
- Seeding test data.
- Running the migration under test.
- Making assertions to verify the results.
This process can be cumbersome to set up and error-prone as it often involves writing an ad-hoc program to automate the steps mentioned above or manually testing the migration.
Atlas's migrate test
command simplifies this by allowing you to define test cases in a concise
syntax and acts as a harness to run these tests during local development and in CI.
In this guide we will learn how to use the migrate test
command to test migration files.
Testing is currently available only to Atlas Pro users. To use this feature, run:
atlas login
Project Setup
Before we begin writing tests, we will start by setting up a project with a config file, schema file, and a migration directory containing a few migration files.
Config File
Create a config file named atlas.hcl
.
In this file we will define an environment, specify the source of our schema, and a URL for our dev database.
We will also create a file named migrate.test.hcl
to write our tests, and
add it to the atlas.hcl
file in the test block.
env "dev" {
src = "file://schema.hcl"
dev = "docker://postgres/15/dev"
# Test configuration for local development.
test {
migrate {
src = ["migrate.test.hcl"]
}
}
}
Schema File
Next, we will create a schema containing two tables:
users
: stores user-specific data, such asid
andemail
.posts
: holds the post content, when it was created, and links each post to its author with auser_id
.
- schema.hcl
- schema.sql
schema "public" {}
table "users" {
schema = schema.public
column "id" {
type = int
null = false
}
column "email" {
type = varchar(255)
null = false
}
primary_key {
columns = [column.id]
}
}
table "posts" {
schema = schema.public
column "id" {
type = int
null = false
}
column "title" {
type = varchar(100)
null = false
}
column "created_at" {
null = false
type = timestamp
}
column "user_id" {
type = int
null = false
}
primary_key {
columns = [column.id]
}
foreign_key "authors_fk" {
columns = [column.user_id]
ref_columns = [table.users.column.id]
}
}
-- Add new schema named "public"
CREATE SCHEMA IF NOT EXISTS "public";
-- Create "users" table
CREATE TABLE "public"."users" ("id" integer NOT NULL, "email" character varying(255) NOT NULL, PRIMARY KEY ("id"));
-- Create "posts" table
CREATE TABLE "public"."posts" ("id" integer NOT NULL, "title" character varying(100) NOT NULL, "created_at" TIMESTAMP NOT NULL, "user_id" integer NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "authors_fk" FOREIGN KEY ("user_id") REFERENCES "public"."users" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION);
Generating a Migration
To generate our initial migration, we will run the following command:
atlas migrate diff --env dev
We should see a new migrations
directory created with the following files:
- 20240807192632.sql
- atlas.sum
-- Create "users" table
CREATE TABLE "users" ("id" integer NOT NULL, "email" character varying(255) NOT NULL, PRIMARY KEY ("id"));
-- Create "posts" table
CREATE TABLE "posts" ("id" integer NOT NULL, "title" character varying(100) NOT NULL, "created_at" timestamp NOT NULL, "user_id" integer NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "authors_fk" FOREIGN KEY ("user_id") REFERENCES "users" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION);
h1:fzAvfmbDGG5M4KAjGXqAylAE3PawD5+6GqkVqvizMys=
20240807192632.sql h1:oOoSEXM3VyBkPGeWdyoYfAVF1dFn79c3yGhsy3Ys8PI=
Expanding Business Logic
Great! Now that we have a basic schema and migration directory, let's add some business logic before we begin testing.
We will add a column latest_post_ts
to the users
table, which will hold the timestamp
of the user's most recent post. To automatically populate this column we will create a trigger
update_latest_post_trigger
.
- schema.hcl
- schema.sql
schema "public" {}
table "users" {
schema = schema.public
column "id" {
null = false
type = integer
}
column "email" {
null = false
type = character_varying(255)
}
column "latest_post_ts" {
null = true
type = timestamp
}
primary_key {
columns = [column.id]
}
}
table "posts" {
schema = schema.public
column "id" {
null = false
type = integer
}
column "title" {
null = false
type = character_varying(100)
}
column "created_at" {
null = false
type = timestamp
}
column "user_id" {
null = false
type = integer
}
primary_key {
columns = [column.id]
}
foreign_key "authors_fk" {
columns = [column.user_id]
ref_columns = [table.users.column.id]
}
}
function "update_latest_post" {
schema = schema.public
lang = PLpgSQL
return = trigger
as = <<-SQL
BEGIN
UPDATE "public"."users"
SET "latest_post_ts" = (
SELECT MAX("created_at")
FROM "public"."posts"
WHERE "user_id" = NEW."user_id"
)
WHERE "id" = NEW."user_id";
RETURN NEW;
END;
SQL
}
trigger "update_latest_post_trigger" {
on = table.posts
after {
insert = true
}
for = ROW
execute {
function = function.update_latest_post
}
}
-- Add new schema named "public"
CREATE SCHEMA IF NOT EXISTS "public";
-- Create "users" table
CREATE TABLE "public"."users" ("id" integer NOT NULL, "email" character varying(255) NOT NULL, "latest_post_ts" TIMESTAMP, PRIMARY KEY ("id"));
-- Create "posts" table
CREATE TABLE "public"."posts" ("id" integer NOT NULL, "title" character varying(100) NOT NULL, "created_at" TIMESTAMP NOT NULL, "user_id" integer NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "authors_fk" FOREIGN KEY ("user_id") REFERENCES "public"."users" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION);
-- Create the trigger function
CREATE OR REPLACE FUNCTION update_latest_post()
RETURNS TRIGGER AS $$
BEGIN
UPDATE "public"."users"
SET "latest_post_ts" = (
SELECT MAX("created_at")
FROM "public"."posts"
WHERE "user_id" = NEW."user_id"
)
WHERE "id" = NEW."user_id";
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Create the trigger
CREATE TRIGGER update_latest_post_trigger
AFTER INSERT ON "public"."posts"
FOR EACH ROW
EXECUTE FUNCTION update_latest_post();
Run the migrate diff
command once more to generate another migration:
atlas migrate diff --env dev
The following migration should be created:
-- Create "update_latest_post" function
CREATE FUNCTION "update_latest_post" () RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
UPDATE "public"."users"
SET "latest_post_ts" = (
SELECT MAX("created_at")
FROM "public"."posts"
WHERE "user_id" = NEW."user_id"
)
WHERE "id" = NEW."user_id";
RETURN NEW;
END;
$$;
-- Create trigger "update_latest_post_trigger"
CREATE TRIGGER "update_latest_post_trigger" AFTER INSERT ON "posts" FOR EACH ROW EXECUTE FUNCTION "update_latest_post"();
-- Modify "users" table
ALTER TABLE "users" ADD COLUMN "latest_post_ts" timestamp NULL;
Back-filling Data
The trigger we created will automatically update the latest_post_ts
column in the users
table
whenever a new post is added. However, we need to back-fill the existing data in the posts
table
to ensure that the latest_post_ts
column is accurate.
To do this, let's edit the 20240730073842.sql
migration file to include a query that will update
the latest_post_ts
column for each user. Add the following SQL statement to the migration file:
-- .. Redacted for brevity
-- Back-fill the latest post timestamp for each user
UPDATE "users"
SET "latest_post_ts" = (
SELECT MAX("created_at")
FROM "posts"
WHERE "user_id" = "users"."id"
);
After changing the contents of a migration file we need to recalculate the atlas.sum
file to ensure
directory integrity. Run the following command:
atlas migrate hash --env dev
Testing Migrations
Rewriting data on a production database can be risky (and scary) if not done correctly. Let's now write a test to ensure that our migration will run smoothly and leave our application in a consistent state.
To do so, our test will have the following logic:
- Migrate to the first version, before creating the trigger, modifying the
users
table and back-filling the data. - Seed the
users
andposts
tables with data. - Run our new migration file, which will create the trigger and back-fill the
latest_post_ts
column. - Verify that the
latest_post_ts
column is correctly populated.
test "migrate" "check_latest_post" {
migrate {
to = "20240807192632"
}
exec {
sql = <<-SQL
INSERT INTO users (id, email) VALUES (1, 'user1@example.com'), (2, 'user2@example.com');
INSERT INTO posts (id, title, created_at, user_id) VALUES (1, 'My First Post', '2024-01-23 00:51:54', 1), (2, 'Another Interesting Post', '2024-02-24 02:14:09', 2);
SQL
}
migrate {
to = "20240807192934"
}
exec {
sql = "select * from users"
format = table
output = <<TAB
id | email | latest_post_ts
----+-------------------+---------------------
1 | user1@example.com | 2024-01-23 00:51:54
2 | user2@example.com | 2024-02-24 02:14:09
TAB
}
log {
message = "Data migrated successfully"
}
}
Let's break down the test:
- We define a migration test named
check_latest_post
. - We migrate to the first version of the schema, which does not include the trigger and the
latest_post_ts
column. - We insert data into the
users
andposts
tables. At this point, thelatest_post_ts
column does not exist. - We migrate to the second version of the schema, which includes the trigger and the
latest_post_ts
column as well as our back-fill. - We assert that the
latest_post_ts
column is correctly populated with the latest post timestamp for each user.
To run this test, we will run the migrate test
command:
atlas migrate test --env dev
The output should be similar to:
-- PASS: check_latest_post (50ms)
migrate.test.hcl:24: Data migrated successfully
PASS
Great! Our test passed, and we can now confidently deploy our migration to production.
Integrating with CI
Writing and running tests locally is a great start, but it's equally important to run these tests in a CI/CD pipeline.
Atlas provides out-of-the-box integrations for running migrate test
in popular CI/CD platforms such as
GitHub Actions and CircleCI.
However, you can easily integrate migration testing into any CI/CD platform by running the atlas migrate test
command.
Wrapping Up
In this guide we learned how to use the atlas migrate test
command
to test our migration files before deployment.