Testing Data Migrations

Most commonly, migrations deal with schema changes, such as adding or removing columns, creating tables, or altering constraints. However, as your application evolves, you may need to add or refactor data within the database, which is where data migrations come in. For instance, you may need to seed data in a table, backfill data for existing records in new columns, or somehow transform existing data to accommodate changes in your application.

Data migrations can be especially tricky to get right, and mistakes can be problematic and irreversible. For this reason testing data migrations is crucial. Testing data migrations typically involves the following steps:

Setting up an empty database.
Applying migrations up to the one before the test.
Seeding test data.
Running the migration under test.
Making assertions to verify the results.

This process can be cumbersome to set up and error-prone as it often involves writing an ad-hoc program to automate the steps mentioned above or manually testing the migration.

Atlas's migrate test command simplifies this by allowing you to define test cases in a concise syntax and acts as a harness to run these tests during local development and in CI.

In this guide we will learn how to use the migrate test command to test migration files.

Atlas Pro Feature

Testing is currently available only to Atlas Pro users. To use this feature, run:

atlas login

Project Setup

Before we begin writing tests, we will start by setting up a project with a config file, schema file, and a migration directory containing a few migration files.

Config File

Create a config file named atlas.hcl.

In this file we will define an environment, specify the source of our schema, and a URL for our dev database.

We will also create a file named migrate.test.hcl to write our tests, and add it to the atlas.hcl file in the test block.

atlas.hcl
env "dev" {
  src = "file://schema.hcl"
  dev = "docker://postgres/15/dev"

  # Test configuration for local development.
  test {
    migrate {
      src = ["migrate.test.hcl"]
    }
  }
}

Schema File

Next, we will create a schema containing two tables:

users: stores user-specific data, such as id and email.
posts: holds the post content, when it was created, and links each post to its author with a user_id.

schema.hcl
schema.sql

schema.hcl
schema "public" {}

table "users" {
  schema = schema.public
  column "id" {
    type    = int
    null    = false
  }
  column "email" {
    type    = varchar(255)
    null    = false
  }
  primary_key {
    columns = [column.id]
  }
}

table "posts" {
  schema = schema.public
  column "id" {
    type    = int
    null    = false
  }
  column "title" {
    type    = varchar(100)
    null    = false
  }
  column "created_at" {
    null = false
    type = timestamp
  }
  column "user_id" {
    type    = int
    null    = false
  }
  primary_key {
    columns = [column.id]
  }
  foreign_key "authors_fk" {
    columns = [column.user_id]
    ref_columns = [table.users.column.id]
  }
}

-- Add new schema named "public"
CREATE SCHEMA IF NOT EXISTS "public";
-- Create "users" table
CREATE TABLE "public"."users" ("id" integer NOT NULL, "email" character varying(255) NOT NULL, PRIMARY KEY ("id"));
-- Create "posts" table
CREATE TABLE "public"."posts" ("id" integer NOT NULL, "title" character varying(100) NOT NULL, "created_at" TIMESTAMP NOT NULL, "user_id" integer NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "authors_fk" FOREIGN KEY ("user_id") REFERENCES "public"."users" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION);

Generating a Migration

To generate our initial migration, we will run the following command:

atlas migrate diff --env dev

We should see a new migrations directory created with the following files:

20240807192632.sql
atlas.sum

20240807192632.sql
-- Create "users" table
CREATE TABLE "users" ("id" integer NOT NULL, "email" character varying(255) NOT NULL, PRIMARY KEY ("id"));
-- Create "posts" table
CREATE TABLE "posts" ("id" integer NOT NULL, "title" character varying(100) NOT NULL, "created_at" timestamp NOT NULL, "user_id" integer NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "authors_fk" FOREIGN KEY ("user_id") REFERENCES "users" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION);

atlas.sum

h1:fzAvfmbDGG5M4KAjGXqAylAE3PawD5+6GqkVqvizMys=
20240807192632.sql h1:oOoSEXM3VyBkPGeWdyoYfAVF1dFn79c3yGhsy3Ys8PI=

Expanding Business Logic

Great! Now that we have a basic schema and migration directory, let's add some business logic before we begin testing.

We will add a column latest_post_ts to the users table, which will hold the timestamp of the user's most recent post. To automatically populate this column we will create a trigger update_latest_post_trigger.

schema.hcl
schema.sql

schema.hcl
schema "public" {}
table "users" {
  schema = schema.public
  column "id" {
    null = false
    type = integer
  }
  column "email" {
    null = false
    type = character_varying(255)
  }
  column "latest_post_ts" {
    null = true
    type = timestamp
  }
  primary_key {
    columns = [column.id]
  }
}

table "posts" {
  schema = schema.public
  column "id" {
    null = false
    type = integer
  }
  column "title" {
    null = false
    type = character_varying(100)
  }
  column "created_at" {
    null = false
    type = timestamp
  }
  column "user_id" {
    null = false
    type = integer
  }
  primary_key {
    columns = [column.id]
  }
  foreign_key "authors_fk" {
    columns     = [column.user_id]
    ref_columns = [table.users.column.id]
  }
}

function "update_latest_post" {
  schema = schema.public
  lang   = PLpgSQL
  return = trigger
  as     = <<-SQL
    BEGIN
    UPDATE "public"."users"
    SET "latest_post_ts" = (
      SELECT MAX("created_at")
      FROM "public"."posts"
      WHERE "user_id" = NEW."user_id"
    )
    WHERE "id" = NEW."user_id";
    RETURN NEW;
    END;
  SQL
}

trigger "update_latest_post_trigger" {
  on = table.posts
  after {
    insert = true
  }
  for = ROW
  execute {
    function = function.update_latest_post
  }
}

schema.sql
-- Add new schema named "public"
CREATE SCHEMA IF NOT EXISTS "public";
-- Create "users" table
CREATE TABLE "public"."users" ("id" integer NOT NULL, "email" character varying(255) NOT NULL, "latest_post_ts" TIMESTAMP, PRIMARY KEY ("id"));
-- Create "posts" table
CREATE TABLE "public"."posts" ("id" integer NOT NULL, "title" character varying(100) NOT NULL, "created_at" TIMESTAMP NOT NULL, "user_id" integer NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "authors_fk" FOREIGN KEY ("user_id") REFERENCES "public"."users" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION);

-- Create the trigger function
CREATE OR REPLACE FUNCTION update_latest_post()
RETURNS TRIGGER AS $$
BEGIN
UPDATE "public"."users"
SET "latest_post_ts" = (
    SELECT MAX("created_at")
    FROM "public"."posts"
    WHERE "user_id" = NEW."user_id"
)
WHERE "id" = NEW."user_id";
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Create the trigger
CREATE TRIGGER update_latest_post_trigger
AFTER INSERT ON "public"."posts"
FOR EACH ROW
EXECUTE FUNCTION update_latest_post();

Run the migrate diff command once more to generate another migration:

atlas migrate diff --env dev

The following migration should be created:

20240807192934.sql
-- Create "update_latest_post" function
CREATE FUNCTION "update_latest_post" () RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
UPDATE "public"."users"
SET "latest_post_ts" = (
  SELECT MAX("created_at")
  FROM "public"."posts"
  WHERE "user_id" = NEW."user_id"
)
WHERE "id" = NEW."user_id";
RETURN NEW;
END;
$$;
-- Create trigger "update_latest_post_trigger"
CREATE TRIGGER "update_latest_post_trigger" AFTER INSERT ON "posts" FOR EACH ROW EXECUTE FUNCTION "update_latest_post"();
-- Modify "users" table
ALTER TABLE "users" ADD COLUMN "latest_post_ts" timestamp NULL;

Back-filling Data

The trigger we created will automatically update the latest_post_ts column in the users table whenever a new post is added. However, we need to back-fill the existing data in the posts table to ensure that the latest_post_ts column is accurate.

To do this, let's edit the 20240730073842.sql migration file to include a query that will update the latest_post_ts column for each user. Add the following SQL statement to the migration file:

20240730073842.sql
-- .. Redacted for brevity

-- Back-fill the latest post timestamp for each user
UPDATE "users"
SET "latest_post_ts" = (
  SELECT MAX("created_at")
  FROM "posts"
  WHERE "user_id" = "users"."id"
);

After changing the contents of a migration file we need to recalculate the atlas.sum file to ensure directory integrity. Run the following command:

atlas migrate hash --env dev

Testing Migrations

Rewriting data on a production database can be risky (and scary) if not done correctly. Let's now write a test to ensure that our migration will run smoothly and leave our application in a consistent state.

To do so, our test will have the following logic:

Migrate to the first version, before creating the trigger, modifying the users table and back-filling the data.
Seed the users and posts tables with data.
Run our new migration file, which will create the trigger and back-fill the latest_post_ts column.
Verify that the latest_post_ts column is correctly populated.

migrate.test.hcl
test "migrate" "check_latest_post" {
  migrate {
    to = "20240807192632"
  }
  exec {
    sql = <<-SQL
      INSERT INTO users (id, email) VALUES (1, 'user1@example.com'), (2, 'user2@example.com');
      INSERT INTO posts (id, title, created_at, user_id) VALUES (1, 'My First Post', '2024-01-23 00:51:54', 1), (2, 'Another Interesting Post', '2024-02-24 02:14:09', 2);
    SQL
  }
  migrate {
    to = "20240807192934"
  }
  exec {
    sql = "select * from users"
    format = table
    output = <<TAB
 id |       email       |   latest_post_ts
----+-------------------+---------------------
1  | user1@example.com | 2024-01-23 00:51:54
2  | user2@example.com | 2024-02-24 02:14:09
TAB
  }
  log {
    message = "Data migrated successfully"
  }
}

Let's break down the test:

We define a migration test named check_latest_post.
We migrate to the first version of the schema, which does not include the trigger and the latest_post_ts column.
We insert data into the users and posts tables. At this point, the latest_post_ts column does not exist.
We migrate to the second version of the schema, which includes the trigger and the latest_post_ts column as well as our back-fill.
We assert that the latest_post_ts column is correctly populated with the latest post timestamp for each user.

To run this test, we will run the migrate test command:

atlas migrate test --env dev

The output should be similar to:

Test Output
-- PASS: check_latest_post (50ms)
    migrate.test.hcl:24: Data migrated successfully
PASS

Great! Our test passed, and we can now confidently deploy our migration to production.

Integrating with CI

Writing and running tests locally is a great start, but it's equally important to run these tests in a CI/CD pipeline. Atlas provides out-of-the-box integrations for running migrate test in popular CI/CD platforms such as GitHub Actions and CircleCI. However, you can easily integrate migration testing into any CI/CD platform by running the atlas migrate test command.

Wrapping Up

In this guide we learned how to use the atlas migrate test command to test our migration files before deployment.

Project Setup​

Config File​

Schema File​

Generating a Migration​

Expanding Business Logic​

Back-filling Data​

Testing Migrations​

Integrating with CI​

Wrapping Up​