Skip to main content

Atlas Terraform Provider v0.4.0: HashiCorp partnerships, versioned migrations and more

· 5 min read
Tran Minh Giau
Software Engineer

Introduction

Today we are very excited to announce the release of Atlas Terraform Provider v0.4.0. This release brings some exciting new features and improvements to the provider which we will describe in this post.

In addition, this release is the first to be published under our new partnership with HashiCorp as a Technology Partner. Atlas is sometimes described as a "Terraform for Databases", so we have high hopes that this partnership will help us to bring many opportunities to create better ways for integrating database schema management into IaC workflows.

What's new

When people first hear about integrating schema management into declarative workflows, many raise the concern that because making changes to the database is a high-risk operation, they would not trust a tool to do it automatically.

This is a valid concern, and this release contains three new features that we believe will help to address it:

  • SQL plan printing
  • Versioned migrations support
  • Migration safety verification

SQL Plan Printing

In previous versions of the provider, we displayed the plan as a textual diff showing which resources are added, removed or modified. With this version, the provider will also print the SQL statements that will be executed as part of the plan.

For example, suppose we have the following schema:

schema "market" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
comment = "A schema comment"
}

table "users" {
schema = schema.market
column "id" {
type = int
}
column "name" {
type = varchar(255)
}
primary_key {
columns = [
column.id
]
}
}

And our Terraform module looks like this:

terraform {
required_providers {
atlas = {
source = "ariga/atlas"
version = "0.4.0"
}
}
}

provider "atlas" {}

data "atlas_schema" "market" {
src = file("${path.module}/schema.hcl")
dev_db_url = "mysql://root:pass@localhost:3307"
}

resource "atlas_schema" "market" {
hcl = data.atlas_schema.market.hcl
url = "mysql://root:pass@localhost:3306"
dev_db_url = "mysql://root:pass@localhost:3307"
}

When we run terraform plan we will see the following output:

Plan: 1 to add, 0 to change, 0 to destroy.

│ Warning: Atlas Plan

│ with atlas_schema.market,
│ on main.tf line 17, in resource "atlas_schema" "market":
│ 17: resource "atlas_schema" "market" {

│ The following SQL statements will be executed:


│ -- add new schema named "market"
│ CREATE DATABASE `market`
│ -- create "users" table
│ CREATE TABLE `market`.`users` (`id` int NOT NULL, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`)) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci

Versioned migrations

Atlas supports two types of workflows: Declarative and Versioned. With declarative workflows, the plan to migrate the database is generated automatically at runtime. Versioned migrations provide teams with a more controlled workflow where changes are planned, checked-in to source control and reviewed ahead of time. Until today, the Terraform provider only supported the declarative workflow. This release adds support for versioned migrations as well.

Suppose we have the following migration directory of two files:

20221101163823_create_users.sql
CREATE TABLE `users` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`age` bigint(20) NOT NULL,
`name` varchar(255) COLLATE utf8mb4_bin NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `age` (`age`)
);
h1:OlaV3+7xXEWc1uG/Ed2zICttHaS6ydHZmzI7Hpf2Fss=
20221101163823_create_users.sql h1:mZirkpXBoLLm+M73EbHo07muxclifb70fhWQFfqxjD4=

We can use the Terraform Atlas provider to apply this migration directory to a database:

terraform {
required_providers {
atlas = {
source = "ariga/atlas"
version = "0.4.0"
}
}
}

provider "atlas" {}

// The `atlas_migration` data source loads the current state of the given database
// with regard to the migration directory.
data "atlas_migration" "hello" {
dir = "migrations?format=atlas"
url = "mysql://root:pass@localhost:3306/hello"
}

// The `atlas_migration` resource applies the migration directory to the database.
resource "atlas_migration" "hello" {
dir = "migrations?format=atlas"
version = data.atlas_migration.hello.latest # Use latest to run all migrations
url = data.atlas_migration.hello.url
dev_url = "mysql://root:pass@localhost:3307/test"
}

Running terraform plan will show the following output:

data.atlas_migration.hello: Reading...
data.atlas_migration.hello: Read complete after 0s [id=migrations?format=atlas]

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

# atlas_migration.hello will be created
+ resource "atlas_migration" "hello" {
+ dev_url = (sensitive value)
+ dir = "migrations?format=atlas"
+ id = (known after apply)
+ status = (known after apply)
+ url = (sensitive value)
+ version = "20221101163823"
}

Plan: 1 to add, 0 to change, 0 to destroy.

Linting

Atlas provides extensive support for linting database schemas. This release adds support for linting schemas as part of the Terraform plan. This means that you can now run terraform plan and see if there are any linting errors in your schema. This is especially useful when you are using the versioned migrations workflow, as you can now run terraform plan to see if there are any linting errors in your schema before you apply the changes.

Suppose we add the following migration:

20221101165036_change_unique.sql
ALTER TABLE users
DROP KEY age,
ADD CONSTRAINT NAME UNIQUE (`name`);

If we run terraform plan on the above schema, Terraform prints the following warning:


│ Warning: data dependent changes detected

│ with atlas_migration.hello,
│ on main.tf line 20, in resource "atlas_migration" "hello":
│ 20: resource "atlas_migration" "hello" {

│ File: 20221101165036_change_unique.sql

│ - MF101: Adding a unique index "NAME" on table "users" might fail in case column
│ "name" contains duplicate entries

Atlas detected that the migration may fail in case the column name contains duplicate entries! This is a very useful warning that can help you avoid unpredicted failed deployments. Atlas supports many more safety checks, which you can read about here.

Wrapping up

In this blogpost we have discussed three new features that were added to the Terraform Atlas provider that are designed to make it safer and more predictable to manage your database schemas with Terraform. We hope you will enjoy this release!

Have questions? Feedback? Feel free to reach out on our Discord server.

Migrate Multi-Tenant Environments With Atlas

· 8 min read
Ariel Mashraki
Building Atlas

Wikipedia defines Multi-tenancy as:

a software architecture in which a single instance of software runs on a server and serves multiple tenants.

In recent years, multitenancy has become a common topic in our industry as many organizations provide service to multiple customers using the same infrastructure. Multitenancy usually becomes an issue in software architecture because tenants often expect a decent level of isolation from one another.

In this post, I will go over different known approaches for achieving multi-tenancy and discuss the approach we took to build Ariga's cloud platform. In addition, I will demonstrate how we added built-in support for multi-tenant environments in Atlas to overcome some of the challenges we faced.

Introduction

Throughout the last few years, I have had the opportunity to implement multi-tenancy in various ways. Some of them might be familiar to you:

  1. A separate environment (deployment) per tenant, where isolation is achieved at both compute and data layers.
  2. A schema (named database) per tenant, where there is one environment for compute (e.g., a K8S cluster), but tenants are stored in different databases or schemas. Isolation is achieved at the data layer while compute resources are shared.
  3. One environment for all tenants, including the data layer. Typically, in this case, each table holds a tenant_id column that is used to filter statements by the tenant. Both data and compute layers are shared, with isolation achieved at the logical, database query level.

Each approach has pros and cons, but I want to briefly list the main reasons we chose to build our cloud platform based on the second option: schema per tenant.

  1. Management: Easily delete, backup tenants, and allow them to export their data without affecting others.
  2. Isolation: Limit credentials, connection pooling, and quotas per tenant. This way, one tenant cannot cause the database to choke and interrupt other tenants in case they share the same physical database.
  3. Security and data privacy: In case it is required, some tenants can be physically separated from others. For example, data can be stored in the tenant's AWS account, and the application can connect to it using a secure connection, like VPC peering in AWS.
  4. Code-maintenance: Most of the application code is written in a way that it is unaware of the multi-tenancy. In our case, there is one layer "at the top" that attaches the tenant connection to the context, and the API layer (e.g., GraphQL resolver) extracts the connection from the context to read/write data. As a result, we are not concerned that API changes will cross tenant boundaries.
  5. Migration: Schema changes can be executed first on "test tenants" and fail-fast in case of error.

The primary con to this approach was that there was no elegant way to execute migrations on multiple databases (N times) in Atlas. In the rest of the post, I'll cover how we solved this problem in Ariga and added built-in support for multi-tenancy in Atlas.

Atlas config file

Atlas provides a convenient way to describe and interact with multiple environments using project files. A project file is a file named atlas.hcl and contains one or more env blocks. For example:

atlas.hcl
env "local" {
url = "mysql://root:pass@:3306/"
migrations {
dir = "file://migrations"
}
}

env "prod" {
// ... a different env
}

Once defined, a project's environment can be worked against using the --env flag. For example:

atlas migrate apply --env local

The command above runs the schema apply against the database that is defined in the local environment.

Multi-Tenant environments

The Atlas configuration language provides a few capabilities adopted from Terraform to facilitate the definition of multi-tenant environments. The first is the for_each meta-argument that allows defining a single env block that is expanded to N instances, one for each tenant. For example:

atlas.hcl
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}

variable "tenants" {
type = list(string)
}

env "local" {
for_each = toset(var.tenants)
url = urlsetpath(var.url, each.value)
migration {
dir = "file://migrations"
}
}

The above configuration expects a list of tenants to be provided as a variable. This can be useful when the list of tenants is dynamic and can be injected into the Atlas command. The urlsetpath function is a helper function that sets the path of the database URL to the tenant name. For example, if url is set to mysql://root:pass@:3306/?param=value and the tenant name is tenant1, the resulting URL will be mysql://root:pass@:3306/tenant1?param=value.

The second capability is Data Sources. This option enables users to retrieve information stored in an external service or database. For the sake of this example, let's extend the configuration above to use the SQL data source to retrieve the list of tenants from the INFORMATION_SCHEMA in MySQL:

atlas.hcl
// The URL of the database we operate on.
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}

// Schemas that match this pattern will be considered tenants.
variable "pattern" {
type = string
default = "tenant_%"
}

data "sql" "tenants" {
url = var.url
query = <<EOS
SELECT `schema_name`
FROM `information_schema`.`schemata`
WHERE `schema_name` LIKE ?
EOS
args = [var.pattern]
}

env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
}

Example

Let's demonstrate how managing migrations in a multi-tenant architecture is made simple with Atlas.

1. Install Atlas

To download and install the latest release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | sh

2. Create a migration directory with the following example content:

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

3. Create two example tenants on a local database:

create database tenant_a8m;
create database tenant_rotemtam;

4. Run Atlas to execute the migration scripts on the tenants' databases:

atlas migrate apply --env local
tenant_a8m
Migrating to version 20220811074314 (2 migrations in total):

-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (36.803179ms)

-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (26.184177ms)

-------------------------
-- 72.899146ms
-- 2 migrations
-- 2 sql statements
tenant_rotemtam
Migrating to version 20220811074314 (2 migrations in total):

-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (61.987153ms)

-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (24.656515ms)

-------------------------
-- 95.233384ms
-- 2 migrations
-- 2 sql statements

Running the command again will not execute any migrations:

No migration files to execute
No migration files to execute

Migration logging

At Ariga, our services print structured logs (JSON) to feed our observability tools. That is why we felt obligated to add support for custom log formatting in Atlas. To continue the example from above, we present how we configure Atlas to emit JSON lines with the tenant name attached to them.

1. Add the log configuration to the local environment block:

atlas.hcl
env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
// Emit JSON logs to stdout and add the
// tenant name to each log line.
format {
migrate {
apply = format(
"{{ json . | json_merge %q }}",
jsonencode({
Tenant : each.value
})
)
}
}
}

2. Create a new script file in the migration directory:

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

3. Run migrate apply in our "local" environment:

atlas migrate apply --env local
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.685899+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.655879+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.685899+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.647091+03:00","Target":"20221027125605","Tenant":"tenant_a8m","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_a8m","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_a8m","Scheme":"mysql","User":{}}}
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.787476+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.757463+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.787476+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.748399+03:00","Target":"20221027125605","Tenant":"tenant_rotemtam","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_rotemtam","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_rotemtam","Scheme":"mysql","User":{}}}

Next steps

Currently, Atlas uses a fail-fast policy, which means the process exits on the first tenant that returns an error. We built it this way because we find it helpful to execute migrations first on "test tenants" and stop in case the operation fails on any of them. However, this means the execution is serial and may be slow in cases where there is a large amount of tenants. Therefore, we aim to add more advanced approaches that will allow executing the first M tenants serially and the rest of the N-M tenants in parallel.

Have questions? Feedback? Feel free to reach out on our Discord server.

The Atlas Migration Execution Engine

· 8 min read
Jannik Clausen
Building Atlas

With the release of v0.6.0, we introduced a workflow for managing changes to database schemas that we have called: Versioned Migration Authoring.

Today, we released the first version of the Atlas migration execution engine, that can apply migration files on your database. In this post, we will give a brief overview of the features and what to expect in the future.

Migration File Format

The Atlas migration filename format follows a very simple structure: version_[name].sql, with the name being optional. version can be an arbitrary string. Migration files are lexicographically sorted by filename.

↪ tree .
.
├── 1_initial.sql
├── 2_second.sql
├── 3_third.sql
└── atlas.sum

0 directories, 4 files

If you want to follow along, you can simply copy and paste the above files in a folder on your system. Make sure you have a database ready to work on. You can start an ephemeral docker container with the following command:

# Run a local mysql container listening on port 3306.
docker run --rm --name atlas-apply --detach --env MYSQL_ROOT_PASSWORD=pass -p 3306:3306 mysql:8

Apply Migrations

In order to apply migrations you need to have the Atlas CLI in version v0.7.0 or above. Follow the installation instructions if you don't have Atlas installed yet.

Now, to apply the first migration of our migration directory, we call atlas migrate apply and pass in some configuration parameters.

atlas migrate apply 1 \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migrating to version 1 (1 migrations in total):

-- migrating version 1
-> CREATE DATABASE `my_schema`;
-> CREATE TABLE `my_schema`.`tbl` (`col` int NOT NULL);
-- ok (17.247319ms)

-------------------------
-- 18.784204ms
-- 1 migrations
-- 2 sql statements

Migration Status

Atlas saves information about the database schema revisions (applied migration versions) in a special table called atlas_schema_revisions. In the example above we connected to the database without specifying which schema to operate against. For this reason, Atlas created the revision table in a new schema called atlas_schema_revisions. For a schema-bound connection Atlas will put the table into the connected schema. We will see that in a bit.

Go ahead and call atlas migrate status to gather information about the database migration state:

atlas migrate status \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migration Status: PENDING
-- Current Version: 1
-- Next Version: 2
-- Executed Files: 1
-- Pending Files: 2

This output tells us that the last applied version is 1, the next one is called 2 and that we still have two migrations pending. Let's apply the pending migrations:

Note, that we do not pass an argument to the apply, in which case Atlas will attempt to apply all pending migrations.

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migrating to version 3 from 1 (2 migrations in total):

-- migrating version 2
-> ALTER TABLE `my_schema`.`tbl` ADD `col_2` TEXT;
-- ok (13.98847ms)

-- migrating version 3
-> CREATE TABLE `tbl_2` (`col` int NOT NULL);
Error 1046: No database selected

-------------------------
-- 15.604338ms
-- 1 migrations ok (1 with errors)
-- 1 sql statements ok (1 with errors)

Error: Execution had errors: Error 1046: No database selected

Error: sql/migrate: executing statement "CREATE TABLE `tbl_2` (`col` int NOT NULL);" from version "3": Error 1046: No database selected

What happened here? After further investigation, you will find that our connection URL is bound to the entire database, not to a schema. The third migration file however does not contain a schema qualifier for the CREATE TABLE statement.

By default, Atlas wraps the execution of each migration file into one transaction. This transaction gets rolled back if any error occurs withing execution. Be aware though, that some databases, such as MySQL and MariaDB, don't support transactional DDL. If you want to learn how to configure the way Atlas uses transactions, have a look at the docs.

Migration Retry

To resolve this edit the migration file and add a qualifier to the statement:

CREATE TABLE `my_schema`.`tbl_2` (`col` int NOT NULL);

Since you changed the contents of a migration file, we have to re-calculate the directory integrity hash-sum by calling:

atlas migrate hash --force \
--dir "file://migrations"

Then we can proceed and simply attempt to execute the migration file again.

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migrating to version 3 from 2 (1 migrations in total):

-- migrating version 3
-> CREATE TABLE `my_schema`.`tbl_2` (`col` int NOT NULL);
-- ok (15.168892ms)

-------------------------
-- 16.741173ms
-- 1 migrations
-- 1 sql statements

Attempting to migrate again or calling atlas migrate status will tell us that all migrations have been applied onto the database and there is nothing to do at the moment.

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
No migration files to execute

Moving an existing project to Atlas with Baseline Migrations

Another common scenario is when you need to move an existing project to Atlas. To do so, create an initial migration file reflecting the current state of a database schema by using atlas migrate diff. A very simple way to do so would be by heading over to the database from before, deleting the atlas_schema_revisions schema, emptying your migration directory and running the atlas migrate diff command.

rm -rf migrations
docker exec atlas-apply mysql -ppass -e "CREATE SCHEMA `my_schema_dev`;" # create a dev-db
docker exec atlas-apply mysql -ppass -e "DROP SCHEMA `atlas_schema_revisions`;"
atlas migrate diff \
--dir "file://migrations" \
--to "mysql://root:pass@localhost:3306/my_schema" \
--dev-url "mysql://root:pass@localhost:3306/my_schema_dev"

To demonstrate that Atlas can also work on a schema level instead of a realm connection, we are running on a connection bound to the my_schema schema this time.

You should end up with the following migration directory:

-- create "tbl" table
CREATE TABLE `tbl` (`col` int NOT NULL, `col_2` text NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- create "tbl_2" table
CREATE TABLE `tbl_2` (`col` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Now, let's create a new migration file to create a table tbl_3 and update the directory integrity file.

atlas migrate new add_table --dir "file://migrations"
echo "CREATE TABLE `tbl_3` (`col` text NULL);" >> migrations/$(ls -t migrations | head -n1)
atlas migrate hash --force --dir "file://migrations"

Since we now have both a migration file representing our current database state and the new migration file to apply, we can make use of the --baseline flag:

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/my_schema" \
--baseline "20220908110527" # replace the version with the one generated by you
Migrating to version 20220908110847 from 20220908110527 (1 migrations in total):

-- migrating version 20220908110847
-> CREATE TABLE `tbl_3` (`col` text NULL);
-- ok (14.325493ms)

-------------------------
-- 15.786455ms
-- 1 migrations
-- 1 sql statements

Outlook

The Atlas migration engine is powering Ent and the execution engine is already being used within Ariga for several months. We will continue working on improving it, releasing cool features, such as assisted troubleshooting for failed migrations, a more intelligent, dialect-aware execution planning for things like MySQLs implicits commits and more.

Wrapping up

In this post we learned about the new migration execution engine of Atlas and some information about its internals.

Further reading

To learn more about Versioned Migration Authoring:

Have questions? Feedback? Find our team on our Discord server.

Prevent destructive changes to your database with the Atlas GitHub Action

· 5 min read
Rotem Tamir
Building Atlas

Losing data is painful for almost all organizations. This is one of the reasons teams are very cautious when it comes to making changes to their databases. In fact, many teams set explicit policies on what kinds of changes to the database are allowed, often completely prohibiting any change that is destructive.

Destructive changes are changes to a database schema that result in loss of data. For instance, consider a statement such as:

ALTER TABLE `users` DROP COLUMN `email_address`;

This statement is considered destructive because whatever data is stored in the email_address column will be deleted from disk, with no way to recover it.

Suppose you were in charge of a team that decided to prohibit destructive changes, how would you go about enforcing such a policy? From our experience, most teams enforce policies relating to schema migrations in code-review: a human engineer, preferably with some expertise in operating databases, manually reviews any proposed database migration scripts and rejects them if they contain destructive changes.

Relying on a human reviewer to enforce such a policy is both expensive (it takes time and mental energy) and error-prone. Just like manual QA is slowly being replaced with automated testing, and manual code style reviews are being replaced with linters, isn't it time that we automate the process of ensuring that changes to database schemas are safe?

Announcing the Atlas GitHub Action

Today, we're happy to announce the release of the official Atlas GitHub Action which can be used to apply migration directory linting for a bunch of popular database migration tools. golang-migrate, goose, dbmate and Atlas itself are already supported, and Flyway and Liquibase are coming soon.

If you're using GitHub to manage your source code, you're in luck. By adding a short configuration file to your repository, you can start linting your schema migration scripts today! Let's see a short example.

Setting up

Suppose we are running a website for an e-commerce business. To store the data for our website we use a MySQL database. Because the data in this database is everything to us, we use a careful versioned migrations approach where each change to the database schema is described in an SQL script and stored in our Git repository. To execute these scripts we use a popular tool called golang-migrate.

The source code for this example can be found in rotemtam/atlas-action-demo.

Initially, our schema contains two tables: users and orders, documented in the first few migration files:

Create the users table:

-- create "users" table
CREATE TABLE `users` (
`id` int NOT NULL,
`name` varchar(100) NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Add a unique email column:

ALTER TABLE `users` ADD COLUMN `email` varchar(255) NOT NULL, ADD UNIQUE INDEX `email_unique` (`email`);

Create the orders table, with a foreign-key referencing the users table:

-- create "orders" table
CREATE TABLE `orders` (
`id` int NOT NULL,
`user_id` int NOT NULL,
`total` decimal(10) NOT NULL,
PRIMARY KEY (`id`),
INDEX `user_orders` (`user_id`),
CONSTRAINT `user_orders` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE NO ACTION ON DELETE NO ACTION
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Installing the Atlas Action

To make sure we never accidentally delete data during schema changes, we enact a policy that prohibits destructive changes to the database. To enforce this policy, we invoke the atlas-action GitHub Action from within our continuous integration flow by adding a workflow file name .github/workflows/atlas-ci.yaml:

name: Atlas CI
on:
# Run whenever code is changed in the master branch,
# change this to your root branch.
push:
branches:
- master
# Run on PRs where something changed under the `path/to/migration/dir/` directory.
pull_request:
paths:
- 'migrations/*'
jobs:
lint:
services:
# Spin up a mysql:8.0.29 container to be used as the dev-database for analysis.
mysql:
image: mysql:8.0.29
env:
MYSQL_ROOT_PASSWORD: pass
MYSQL_DATABASE: test
ports:
- "3306:3306"
options: >-
--health-cmd "mysqladmin ping -ppass"
--health-interval 10s
--health-start-period 10s
--health-timeout 5s
--health-retries 10
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3.0.1
with:
fetch-depth: 0 # Mandatory unless "latest" is set below.
- uses: ariga/atlas-action@v0
with:
dir: migrations/
dir-format: golang-migrate # Or: atlas, goose, dbmate
dev-url: mysql://root:pass@localhost:3306/test

Detecting a destructive change

Next, let's see what happens when a developer accidentally proposes a destructive change, to drop a column in the orders table:

-- modify "orders" table
ALTER TABLE `orders` DROP COLUMN `total`;

This change is proposed in PR #1 in our example repo. Because we have previously set up the Atlas GitHub Action to lint our migration directory, whenever a file changes under the migrations/ directory, a workflow is triggered.

After letting our workflow complete, observe that GitHub informs us that the Atlas CI / lint check has failed:

Clicking on the "details" link we find a detailed explanation on the causes for the failure:

Examining the Action run summary we find the following annotation:

As you can see, Atlas has detected the destructive change we proposed to apply to our database and failed our build!

Wrapping up

In this post we discussed why many teams set policies to prevent destructive changes to database schemas. We further showed how such policies can be enforced in an automated way using the official Atlas GitHub Action.

Further reading

To learn more about CI for database schema changes:

Have questions? Feedback? Find our team on our Discord server.

Announcing v0.6.0 with Versioned Migration Authoring

· 8 min read
Ariel Mashraki
Building Atlas

With the release of v0.6.0, we are happy to announce official support for a style of workflow for managing changes to database schemas that we have been experimenting with in the past months: Versioned Migration Authoring.

TL;DR

  • Atlas supports a declarative workflow (similar to Terraform) where users provide the desired database schema in a simple data definition language and Atlas calculates a plan to get a target database to that state. This workflow is supported by the schema apply command.
  • Many teams prefer a more imperative approach where each change to the database schema is checked-in to source control and reviewed during code-review. This type of workflow is commonly called versioned migrations (or change based migrations) and is supported by many established tools such as Flyway and Liquibase.
  • The downside of the versioned migration approach is, of course, that it puts the burden of planning the migration on developers. As part of the Atlas project we advocate for a third combined approach that we call "Versioned Migration Authoring".
  • Versioned Migration Authoring is an attempt to combine the simplicity and expressiveness of the declarative approach with the control and explicitness of versioned migrations.
  • To use Versioned Migration Authoring today, use the atlas migrate diff command. See the Getting Started section below for instructions.

Declarative Migrations

The declarative approach has become increasingly popular with engineers nowadays because it embodies a convenient separation of concerns between application and infrastructure engineers. Application engineers describe what (the desired state) they need to happen, and infrastructure engineers build tools that plan and execute ways to get to that state (how). This division of labor allows for great efficiencies as it abstracts away the complicated inner workings of infrastructure behind a simple, easy to understand API for the application developers and allows for specialization and development of expertise to pay off for the infra people.

With declarative migrations, the desired state of the database schema is given as input to the migration engine, which plans and executes a set of actions to change the database to its desired state.

For example, suppose your application uses a small SQLite database to store its data. In this database, you have a users table with this structure:

schema "main" {}

table "users" {
schema = schema.main
column "id" {
type = int
}
column "greeting" {
type = text
}
}

Now, suppose that you want to add a default value of "shalom" to the greeting column. Many developers are not aware that it isn't possible to modify a column's default value in an existing table in SQLite. Instead, the common practice is to create a new table, copy the existing rows into the new table and drop the old one after. Using the declarative approach, developers can change the default value for the greeting column:

schema "main" {}

table "users" {
schema = schema.main
column "id" {
type = int
}
column "greeting" {
type = text
default = "shalom"
}
}

And have Atlas's engine devise a plan similar to this:

-- Planned Changes:
-- Create "new_users" table
CREATE TABLE `new_users` (`id` int NOT NULL, `greeting` text NOT NULL DEFAULT 'shalom')
-- Copy rows from old table "users" to new temporary table "new_users"
INSERT INTO `new_users` (`id`, `greeting`) SELECT `id`, IFNULL(`greeting`, 'shalom') AS `greeting` FROM `users`
-- Drop "users" table after copying rows
DROP TABLE `users`
-- Rename temporary table "new_users" to "users"
ALTER TABLE `new_users` RENAME TO `users`

Versioned Migrations

As the database is one of the most critical components in any system, applying changes to its schema is rightfully considered a dangerous operation. For this reason, many teams prefer a more imperative approach where each change to the database schema is checked-in to source control and reviewed during code-review. Each such change is called a "migration", as it migrates the database schema from the previous version to the next. To support this kind of requirement, many popular database schema management tools such as Flyway, Liquibase or golang-migrate support a workflow that is commonly called "versioned migrations".

In addition to the higher level of control which is provided by versioned migrations, applications are often deployed to multiple remote environments at once. These environments are not controlled (or even accessible) by the development team. In such cases, declarative migrations, which rely on a network connection to the target database and on human approval of migrations plans in real-time, are not a feasible strategy.

With versioned migrations (sometimes called "change-based migrations"), instead of describing the desired state ("what the database should look like"), developers describe the changes themselves ("how to reach the state"). Most of the time, this is done by creating a set of SQL files containing the statements needed. Each of the files is assigned a unique version and a description of the changes. Tools like the ones mentioned earlier are then able to interpret the migration files and to apply (some of) them in the correct order to transition to the desired database structure.

The benefit of the versioned migrations approach is that it is explicit: engineers know exactly what queries are going to be run against the database when the time comes to execute them. Because changes are planned ahead of time, migration authors can control precisely how to reach the desired schema. If we consider a migration as a plan to get from state A to state B, oftentimes multiple paths exist, each with a very different impact on the database. To demonstrate, consider an initial state which contains a table with two columns:

CREATE TABLE users (
id int,
name varchar(255)
);

Suppose our desired state is:

CREATE TABLE users (
id int,
user_name varchar(255)
);

There are at least two ways get from the initial to the desired state:

  • Drop the name column and create a new user_name column.
  • Alter the name of the name column to user_name.

Depending on the context, either may be the desired outcome for the developer planning the change. With versioned migrations, engineers have the ultimate confidence of what change is going to happen, which may not be known ahead of time in a declarative approach.

Migration Authoring

The downside of the versioned migration approach is, of course, that it puts the burden of planning the migration on developers. This requires a certain level of expertise that is not always available to every engineer, as we demonstrated in our example of setting a default value in a SQLite database above.

As part of the Atlas project we advocate for a third combined approach that we call "Versioned Migration Authoring". Versioned Migration Authoring is an attempt to combine the simplicity and expressiveness of the declarative approach with the control and explicitness of versioned migrations.

With versioned migration authoring, users still declare their desired state and use the Atlas engine to plan a safe migration from the existing to the new state. However, instead of coupling planning and execution, plans are instead written into normal migration files which can be checked-in to source control, fine-tuned manually and reviewed in regular code review processes.

Getting started

Start by downloading the Atlas CLI:

To download and install the latest release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | sh

Next, define a simple Atlas schema with one table and an empty migration directory:

schema.hcl
schema "test" {}

table "users" {
schema = schema.test
column "id" {
type = int
}
}

Let's run atlas migrate diff with the necessary parameters to generate a migration script for creating our users table:

  • --dir the URL to the migration directory, by default it is file://migrations.
  • --to the URL of the desired state, an HCL file or a database connection.
  • --dev-url a URL to a Dev Database that will be used to compute the diff.
atlas migrate diff create_users \
--dir="file://migrations" \
--to="file://schema.hcl" \
--dev-url="mysql://root:pass@:3306/test"

Observe that two files were created in the migrations directory:

By default, migration files are named with the following format {{ now }}_{{ name }}.sql. If you wish to use a different file format, use the --dir-format option.

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Further reading

To learn more about Versioned Migration Authoring:

Have questions? Feedback? Find our team on our Discord server.

Announcing v0.5.0 with Migration Directory Linting

· 4 min read
Rotem Tamir
Building Atlas

With the release of v0.5.0, we are happy to announce a very significant milestone for the project. While this version includes some cool features (such as multi-file schemas) and a swath of incremental improvements and bugfixes, there is one feature that we're particularly excited about and want to share with you in this post.

As most outages happen directly as a result of a change to a system, Atlas provides users with the means to verify the safety of planned changes before they happen. The sqlcheck package provides interfaces for analyzing the contents of SQL files to generate insights on the safety of many kinds of changes to database schemas. With this package, developers may define an Analyzer that can be used to diagnose the impact of SQL statements on the target database.

This functionality is exposed to CLI users via the migrate lint subcommand. By utilizing the sqlcheck package, Atlas can now check your migration directory for common problems and issues.

atlas migrate lint in action

Recall that Atlas uses a dev database to plan and simulate schema changes. Let's start by spinning up a container that will serve as our dev database:

docker run --name atlas-db-dev -d -p 3307:3306 -e MYSQL_ROOT_PASSWORD=pass  mysql

Next let's create schema.hcl, the HCL file which will contain the desired state of our database:

schema "example" {
}
table "users" {
schema = schema.example
column "id" {
type = int
}
column "name" {
type = varchar(255)
}
primary_key {
columns = [
column.id
]
}
}

To simplify the commands we need to type in this demo, let's create an Atlas project file to define a local environment.

env "local" {
src = "./schema.hcl"
url = "mysql://root:pass@localhost:3306"
dev = "mysql://root:pass@localhost:3307"
}

Next, let's plan the initial migration that creates the users table:

atlas migrate diff --env local

Observe that the migrations/ directory was created with an .sql file and a file named atlas.sum:

├── atlas.hcl
├── migrations
│ ├── 20220714090139.sql
│ └── atlas.sum
└── schema.hcl

This is the contents of our new migration script:

-- add new schema named "example"
CREATE DATABASE `example`;
-- create "users" table
CREATE TABLE `example`.`users` (`id` int NOT NULL, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`)) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Next, let's make a destructive change to the schema. Destructive changes are changes to a database schema that result in loss of data, such as dropping a column or table. Let's remove the name name column from our desired schema:

schema "example" {
}
table "users" {
schema = schema.example
column "id" {
type = int
}
// Notice the "name" column is missing.
primary_key {
columns = [
column.id
]
}
}

Now, let's plan a migration to this new schema:

atlas migrate diff --env local

Observe the new migration which Atlas planned for us:

-- modify "users" table
ALTER TABLE `example`.`users` DROP COLUMN `name`;

Finally, let's use atlas migrate lint to analyze this change and verify it's safety:

atlas migrate lint --env local --latest 1

Destructive changes detected in file 20220714090811.sql:

L2: Dropping non-virtual column "name"

When we run the lint command, we need to instruct Atlas on how to decide what set of migration files to analyze. Currently, two modes are supported.

  • --git-base <branchName>: which selects the diff between the provided branch and the current one as the changeset.
  • --latest <n> which selects the latest n migration files as the changeset.

As expected, Atlas analyzed this change and detected a destructive change to our database schema. In addition, Atlas users can analyze the migration directory to automatically detect:

  • Data-dependent changes
  • Migration Directory integrity
  • Backward-incompatible changes (coming soon)
  • Drift between the desired and the migration directory (coming soon)
  • .. and more

Wrapping up

We started Atlas more than a year ago because we felt that the industry deserves a better way to manage databases. A huge amount of progress has been made as part of the DevOps movement on the fronts of managing compute, networking and configuration. So much, in fact, that it always baffled us to see that the database, the single most critical component of any software system, did not receive this level of treatment.

Until today, the task of verifying the safety of migration scripts was reserved to humans (preferably SQL savvy, and highly experienced). We believe that with this milestone we are beginning to pave a road to a reality where teams can move as quickly and safely with their databases as they can with their code.

Have questions? Feedback? Find our team on our Discord server.

Announcing v0.4.2 with preview support for CockroachDB

· 4 min read
Amit Shani
Software Engineer at Ariga

Today, I'm happy to announce the release of v0.4.2 of the Atlas CLI. This version includes many improvements and fixes, but I wanted to share with you exciting news about something I personally worked on. As of v0.4.2, Atlas includes preview support for CockroachDB 🎉

Atlas is an open-source project that helps developers to better manage their database schemas. It has a CLI tool and a Terraform integration. By using Atlas's Data Definition Language (with a syntax similar to Terraform), users can plan, verify and apply changes to their databases in a simple, declarative workflow. Earlier this year, Atlas became the migration engine for Ent, a widely popular, Linux Foundation backed entity framework for Go.

CockroachDB is an open-source NewSQL database. From their README:

CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

CockroachDB has been gaining popularity and many of you have been asking for Atlas to support it.

While CockroachDB aims to be PostgreSQL compatible, it still has some incompatibilities (e.g. 1, 2,3) which prevented Atlas users using the existing Postgres dialect from working with it.

With the latest release of Atlas, the Postgres driver automatically detects if it is connected to a CockroachDB database and uses a custom driver which provides compatability with CockroachDB.

Getting started with Atlas and CockroachDB

Let's see how we can use Atlas CLI to manage the schema of a CockroachDB database. Start by downloading the latest version of Atlas, on macOS:

brew install ariga/tap/atlas

For installation instructions on other platforms, see the docs.

For the purpose of this example, let's spin up a local, single-node CockroachDB cluster in a container by running:

docker run --rm -d -p 26257:26257 --name crdb cockroachdb/cockroach start-single-node --insecure

Next, let's seed the database with a simple table:

docker exec crdb cockroach sql --insecure -e 'CREATE TABLE users (id int primary key);'

After creating the users table, use Atlas's schema inspect command to read the schema of our local database and save the result to a file:

atlas schema inspect -u 'postgres://root:pass@localhost:26257/?sslmode=disable' --schema public > schema.hcl

Observe the current HCL representation of the public schema, which contains our newly created table, users:

table "users" {
schema = schema.public
column "id" {
null = false
type = bigint
}
primary_key {
columns = [column.id]
}
}
schema "public" {
}

Next, edit schema.hcl to add a column to the users table:

schema.hcl
table "users" {
schema = schema.public
column "id" {
null = false
type = bigint
}
column "name" {
type = varchar(100)
}
primary_key {
columns = [column.id]
}
}
schema "public" {
}

Now apply the schema using the schema apply command:

atlas schema apply -u 'postgres://root:pass@localhost:26257/?sslmode=disable' --schema public -f schema.hcl

Atlas prints out the planned changes and asks for your confirmation:

-- Planned Changes:
-- Create "test" table
ALTER TABLE "public"."users" ADD COLUMN "name" character varying(100) NOT NULL
? Are you sure?:
▸ Apply
Abort

After hitting "Apply", Atlas applies the desired schema to the database:

✔ Apply

We have successfully applied our schema to our database.

To stop the container running CockroachDB run:

docker stop crdb

Learn more about Atlas

In this short example, we demonstrated two of Atlas's basic features: database inspection and declarative schema migration (applying a desired schema on a database). Here are some topics you may want to explore when getting started with Atlas:

  • Learn the DDL - learn how to define any SQL resource in Atlas's data definition language.
  • Try the Terraform Provider - see how you can use the Atlas Terraform Provider to integrate schema management in your general Infrastructure-as-Code workflows.
  • Use the migrate command to author migrations - In addition to the Terraform-like declarative workflow, Atlas can manage a migration script directory for you based on your desired schema.

Preview support

The integration of Atlas with CockroachDB is well tested with version v21.2.11 (at the time of writing, latest) and will be extended in the future. If you're using other versions of CockroachDB or looking for help, don't hesitate to file an issue or join our Discord channel.

Have questions? Feedback? Find our team on our Discord server.

Announcing Atlas Project Files

· 4 min read
Rotem Tamir
Building Atlas

A few days ago we released v0.4.1 of Atlas. Along with a multitude of improvements and fixes, I'm happy to announce the release of a feature that we've been planning for a while: Project Files.

Project files provide a way to describe and interact with multiple environments while working with Atlas. A project file is a file named atlas.hcl that contains one or more env blocks, each describing an environment. Each environment has a reference to where the schema definition file resides, a database URL and an array of the schemas in the database that are managed by Atlas:

// Define an environment named "local".
env "local" {
// Declare where the schema definition file resides.
src = "./schema/project.hcl"

// Define the URL of the database which is managed in
// this environment.
url = "mysql://localhost:3306"

// Define the URL of the Dev Database for this environment.
// See: https://atlasgo.io/dev-database
dev = "mysql://localhost:3307"

// The schemas in the database that are managed by Atlas.
schemas = ["users", "admin"]
}

env "dev" {
// ... a different env
}

Project files arose from the need to provide a better experience for developers using the CLI. For example, consider you are using Atlas to plan migrations for your database schema. In this case, you will be running a command similar to this to plan a migration:

atlas migrate diff --dev-url mysql://root:password@localhost:3306 --to file://schema.hcl --dir file://migrations --format atlas

With project files, you can define an environment named local:

env "local" {
url = "mysql://root:password@localhost:3306"
dev = "mysql://root:password@localhost:3307"
src = "./schema.hcl"
migration {
dir = "file://migrations"
}
}

Then run the migrate diff command against this environment using the --env flag:

atlas migrate diff --env local

Alternatively, suppose you want to use Atlas to apply the schema on your staging environment. Without project files, you would use:

atlas schema apply -u mysql://root:password@db.ariga.dev:3306 --dev-url mysql://root:password@localhost:3307 -f schema.hcl

To do the same using a project file, define another env named staging:

env "staging" {
url = "mysql://root:password@db.ariga.dev:3306"
dev = "mysql://root:password@localhost:3307"
src = "./schema.hcl"
}

Then run:

atlas schema apply --env staging

Passing credentials as input values

Similar to schema definition files, project files also support Input Variables. This means that we can define variable blocks on the project file to declare which values should be provided when the file is evaluated. This mechanism can (and should) be used to avoid committing to source control database credentials. To do this, first define a variable named db_password:

variable "db_password" {
type = string
}

Next, replace the database password in all connection strings with a reference to this variable, for example:

env "staging" {
url = "mysql://root:${var.db_password}@db.ariga.dev:3306"
dev = "mysql://root:${var.db_password}@localhost:3307"
src = "./schema.hcl"
}

If we run schema apply without providing the password input variable, we will receive an error message:

Error: missing value for required variable "db_password"

To provide the input variable run:

atlas schema apply --env staging --var db_password=pass

Input variables can be used for many other use cases by passing them as input values to schema files.

What's next

In this post, I presented Project Files, a new feature recently added to Atlas to help developers create more fluent workflows for managing changes to their database schemas. In the coming weeks we will be adding a few more improvements to the dev flow, such as support for marking a specific environment as the default one (alleviating the need to specify --env in many cases) and multi-file schema definitions.

Have questions? Feedback? Find our team on our Discord server.

Terraform for databases: Announcing the official Terraform provider for Atlas

· 5 min read
Amit Shani
Software Engineer at Ariga

Today we are glad to announce the release of the official Atlas Terraform Provider.

What is Terraform

Terraform is a popular open-source tool created by HashiCorp, used to greatly simplify the task of provisioning and managing resources in the cloud. With Terraform, organizations can describe the desired state of their infrastructure in a simple configuration language and let Terraform plan and apply these changes in an automated way. This way, Terraform allows teams to truly deliver infrastructure-as-code (IaC), which completely change how teams and organizations manage their cloud infrastructure.

Infrastructure-as-Code and database management

Most cloud-native applications are backed by a database. The database is often the most critical part of many software systems, so making changes to its schema (structure and layout of the data inside) is a very risky business. However, schemas must evolve: as functionality changes over time, the backing tables are added, columns are dropped, indexes are created for performance reasons, and more.

Therefore it is surprising that there is no established way of integrating the management of schema changes (commonly called schema "migrations") into popular Infrastructure-as-Code workflows. For this reason, many organizations are running migrations from within the application code or using solutions outside the ecosystem of Terraform, meaning that management of the production environment is fragmented and hard to synchronize. Atlas aims to change that.

The Atlas Terraform provider allows you to synchronize your database with your desired schema, in a safe and stateful manner. By using Atlas’s core migration engine and embedding it in a Terraform provider, we are enabling teams to manage their database schemas as part of their full IaC workflow. This way, teams can use existing providers (such as AWS or GCP) to provision the database instance and use the Atlas provider to keep the schema in sync. Integrating Atlas with Terraform is especially useful because it couples the state of the infrastructure with the state of the database. It is also extremely neat when using a dev database, which is a feature that combines infrastructure and DB management to provide safety and correctness.

Demo

Prerequisites

Make sure you have installed:

Let’s see an example of the provider in action. First, spin a database using docker:

docker run -p 3306:3306 --name iloveatlas -e MYSQL_ROOT_PASSWORD=pass -e MYSQL_DATABASE=market -d mysql:8

Great! Now we have an instance of MySQL database running.

As an extra measure of safety, we will run another identical database which will serve as a Dev Database. In short, the dev-db helps to catch errors that can only be detected when applying the schema. It is also useful to format the schema in a correct and predictable way. Read more about it here. Run a second instance of MySQL on another port, to serve as a dev-db:

docker run -p 3307:3306 --name devdb-greatness -e MYSQL_ROOT_PASSWORD=pass -e MYSQL_DATABASE=market -d mysql:8

Next, we need an HCL file describing the desired state of our database. You can use atlas cli to inspect the state of another database or you can use the following basic schema:

schema.hcl
table "orders" {
schema = schema.market
column "id" {
null = false
type = int
auto_increment = true
}
column "name" {
null = false
type = varchar(20)
}
primary_key {
columns = [column.id]
}
}

schema "market" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
}

Save the schema file locally in a file named schema.hcl.

Now that we have our database schema we can use terraform to apply that schema to our database. Create a file named main.tf and copy the following snippet:

main.tf
terraform {
required_providers {
atlas = {
version = "~> 0.4.0"
source = "ariga/atlas"
}
}
}

provider "atlas" {}

data "atlas_schema" "market" {
dev_db_url = "mysql://root:pass@localhost:3307/market"
src = file("${path.module}/schema.hcl")
}

resource "atlas_schema" "market" {
hcl = data.atlas_schema.market.hcl
url = "mysql://root:pass@localhost:3306/market"
dev_db_url = "mysql://root:pass@localhost:3307/market"
}

Finally, init terraform:

terraform init

And apply the schema to the database by executing:

terraform apply --auto-approve

Awesome! Now your database should have a table named orders. To verify that we can connect to the database:

$ docker exec -it iloveatlas mysql -ppass --database=market

mysql> show tables;
+------------------+
| Tables_in_market |
+------------------+
| orders |
+------------------+
1 row in set (0.00 sec)

mysql> show create table orders;
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| orders | CREATE TABLE `orders` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

For more examples and documentation visit the official GitHub repository or the provider page on Terraform registry.

What's next

In this post, we presented the Atlas Terraform Provider. The provider currently supports the basic, declarative migration workflow that is available in the Atlas engine. In upcoming versions, we will add support for an additional kind of workflow that is supported by the engine and is called versioned migration authoring. In addition, more advanced safety checks (such as simulation on database snapshots) and migration strategies are also being worked on.

While the Terraform provider has just been released, the core engine that it is driving, is well tested and widely used (especially as the migration engine backing the popular Ent framework.) If you, like me, have always wanted to manage your database schema as part of your team's infrastructure-as-code workflow, give the Atlas Terraform provider a try!

Have questions? Feedback? Find our team on our Discord server.

Programmatic inspection of databases in Go using Atlas

· 10 min read

Database inspection is the process of connecting to a database to extract metadata about the way data is structured inside it. In this post, we will present some use cases for inspecting a database, demonstrate why it is a non-trivial problem to solve, and finally show how it can be solved using Atlas, an open-source package (and command-line tool) written in Go that we are maintaining at Ariga.

As an infrastructure engineer, I have wished many times to have a simple way to programmatically inspect a database. Database schema inspection can be useful for many purposes. For instance, you might use it to create visualizations of data topologies, or use it to find table columns that are no longer in use and can be deprecated. Perhaps you would like to automatically generate resources from this schema (such as documentation or GraphQL schemas), or to use to locate fields that might carry personally-identifiable information for compliance purposes. Whatever your use case may be, having a robust way to get the schema of your database is the foundation for many kinds of infrastructure applications.

When we started working on the core engine for Atlas, we quickly discovered that there wasn't any established tool or package that could parse the information schema of popular databases and return a data structure representing it. Why is this the case? After all, most databases provide some command-line tool to perform inspection. For example, psql, the standard CLI for Postgres, supports the \d command to describe a table:

postgres=# \d users;
Table "public.users"
Column | Type | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
id | integer | | not null |
name | character varying(255) | | |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)

So what makes inspection a non-trivial problem to solve? In this post, I will discuss two aspects that I think are interesting. The first is the variance in how databases expose schema metadata and the second is the complexity of the data model that is required to represent a database schema.

How databases expose schema metadata

Most of the SQL that we use in day-to-day applications is pretty standard. However, when it comes to exposing schema metadata, database engines vary greatly in the way they work. The way to retrieve information about things like available schemas and tables, column types and their default values and many other aspects of the database schema looks completely different in each database engine. For instance, consider this query (source) which can be used to get the metadata about table columns from a Postgres database:

SELECT t1.table_name,
t1.column_name,
t1.data_type,
t1.is_nullable,
t1.column_default,
t1.character_maximum_length,
t1.numeric_precision,
t1.datetime_precision,
t1.numeric_scale,
t1.character_set_name,
t1.collation_name,
t1.udt_name,
t1.is_identity,
t1.identity_start,
t1.identity_increment,
t1.identity_generation,
col_description(to_regclass("table_schema" || '.' || "table_name")::oid, "ordinal_position") AS comment,
t2.typtype,
t2.oid
FROM "information_schema"."columns" AS t1
LEFT JOIN pg_catalog.pg_type AS t2
ON t1.udt_name = t2.typname
WHERE table_schema = $1
AND table_name IN (%s)
ORDER BY t1.table_name, t1.ordinal_position

As you can see, while it's definitely possible to get the needed metadata, information about the schema is stored in multiple tables in a way that isn't particularly well documented, and often requires delving into the actual source code to understand fully. Here's a query to get similar information from MySQL (source):

SELECT `TABLE_NAME`,
`COLUMN_NAME`,
`COLUMN_TYPE`,
`COLUMN_COMMENT`,
`IS_NULLABLE`,
`COLUMN_KEY`,
`COLUMN_DEFAULT`,
`EXTRA`,
`CHARACTER_SET_NAME`,
`COLLATION_NAME`
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA` = ?
AND `TABLE_NAME` IN (%s)
ORDER BY `ORDINAL_POSITION`

While this query is much shorter, you can see that it's completely different from the one we ran to inspect Postgres column metadata. This demonstrates just one way in inspecting Postgres is difference from inspecting MySQL.

Mapping database schemas into a useful data structure

To be a solid foundation for building infrastructure, inspection must produce a useful data structure that can be traversed and analyzed to provide insights, in other words, a graph representing the data topology. As mentioned above, such graphs can be used to create ERD (entity-relation diagram) charts, such as the schema visualizations on the Atlas Management UI:

Schema ERD open

Let's consider some aspects of database schemas that such a data structure should capture:

  • Databases are split into logical schemas.
  • Schemas contain tables, and may have attributes (such as default collation).
  • Tables contain columns, indexes and constraints.
  • Columns are complex entities that have types, that may be standard to the database engine (and version) or custom data types that are defined by the user. In addition, Columns may have attributes, such as default values, that may be a literal or an expression (it is important to be able to discern between now() and "now()").
  • Indexes contain references to columns of the table they are defined on.
  • Foreign Keys contain references to column in other tables, that may reside in other schemas.
  • ...and much, much more!

To capture any one of these aspects boils down to figuring out the correct query for the specific database engine you are working with. To be able to provide developers with a data structure that captures all of it, and to do it well across different versions of multiple database engines we've learned, is not an easy task. This is a perfect opportunity for an infrastructure project: a problem that is annoyingly complex to solve and that if solved well, becomes a foundation for many kinds of applications. This was one of our motivations for creating Atlas (GitHub) - an open-source project that we maintain here at Ariga.

Using Atlas, database schemas can be inspected to product Go structs representing a graph of the database schema topology. Notice the many cyclic references that make it hard to print (but very ergonomic to travere :-)):

&schema.Realm{
Schemas: {
&schema.Schema{
Name: "test",
Tables: {
&schema.Table{
Name: "users",
Schema: &schema.Schema{(CYCLIC REFERENCE)},
Columns: {
&schema.Column{
Name: "id",
Type: &schema.ColumnType{
Type: &schema.IntegerType{
T: "int",
Unsigned: false,
},
Null: false,
},
},
},
PrimaryKey: &schema.Index{
Unique: false,
Table: &schema.Table{(CYCLIC REFERENCE)},
Attrs: nil,
Parts: {
&schema.IndexPart{
SeqNo: 0,
Desc: false,
C: &schema.Column{(CYCLIC REFERENCE)},
},
},
},
},
&schema.Table{
Name: "posts",
Schema: &schema.Schema{(CYCLIC REFERENCE)},
Columns: {
&schema.Column{
Name: "id",
Type: &schema.ColumnType{
Type: &schema.IntegerType{
T: "int",
Unsigned: false,
},
Null: false,
},
},
&schema.Column{
Name: "author_id",
Type: &schema.ColumnType{
Type: &schema.IntegerType{
T: "int",
Unsigned: false,
},
Null: true,
},
},
},
PrimaryKey: &schema.Index{
Unique: false,
Table: &schema.Table{(CYCLIC REFERENCE)},
Parts: {
&schema.IndexPart{
SeqNo: 0,
Desc: false,
C: &schema.Column{(CYCLIC REFERENCE)},
},
},
},
ForeignKeys: {
&schema.ForeignKey{
Symbol: "owner_id",
Table: &schema.Table{(CYCLIC REFERENCE)},
Columns: {
&schema.Column{(CYCLIC REFERENCE)},
},
RefTable: &schema.Table{(CYCLIC REFERENCE)},
RefColumns: {
&schema.Column{(CYCLIC REFERENCE)},
},
OnDelete: "SET NULL",
},
},
},
},
},
},
}

Inspecting databases in Go using Atlas

While Atlas is commonly used as a CLI tool, all of Atlas's core-engine capabilities are available as a Go module that you can use programmatically. Let's get started with database inspection in Go:

To install Atlas, use:

go get ariga.io/atlas@master

Drivers

Atlas currently supports three core capabilities for working with SQL schemas.

  • "Inspection" - Connecting to a database and understanding its schema.
  • "Plan" - Compares two schemas and produces a set of changes needed to reconcile the target schema to the source schema.
  • "Apply" - creates concrete set of SQL queries to migrate the target database.

In this post we will dive into the inspection with Atlas. The way inspection is done varies greatly between the different SQL databases. Atlas currently has four supported drivers:

  • MySQL
  • MariaDB
  • PostgreSQL
  • SQLite

Atlas drivers are built on top of the standard library database/sql package. To initialize the different drivers, we need to initialize a sql.DB and pass it to the Atlas driver constructor. For example:

package tutorial

import (
"database/sql"
"log"
"testing"
_ "github.com/mattn/go-sqlite3"
"ariga.io/atlas/sql/schema"
"ariga.io/atlas/sql/sqlite"
)

func Test(t *testing.T) {
// Open a "connection" to sqlite.
db, err := sql.Open("sqlite3", "file:example.db?cache=shared&_fk=1&mode=memory")
if err != nil {
log.Fatalf("failed opening db: %s", err)
}
// Open an atlas driver.
driver, err := sqlite.Open(db)
if err != nil {
log.Fatalf("failed opening atlas driver: %s", err)
}
// ... do stuff with the driver
}

Inspection

As we mentioned above, inspection is one of Atlas's core capabilities. Consider the Inspector interface in the sql/schema package:

package schema

// Inspector is the interface implemented by the different database
// drivers for inspecting multiple tables.
type Inspector interface {
// InspectSchema returns the schema description by its name. An empty name means the
// "attached schema" (e.g. SCHEMA() in MySQL or CURRENT_SCHEMA() in PostgreSQL).
// A NotExistError error is returned if the schema does not exists in the database.
InspectSchema(ctx context.Context, name string, opts *InspectOptions) (*Schema, error)

// InspectRealm returns the description of the connected database.
InspectRealm(ctx context.Context, opts *InspectRealmOption) (*Realm, error)
}

As you can see, the Inspector interface provides methods for inspecting on different levels:

  • InspectSchema - provides inspection capabilities for a single schema within a database server.
  • InspectRealm - inspects the entire connected database server.

Each database driver (for example MySQL, Postgres or SQLite) implements this interface. Let's see how we can use this interface by inspecting a "dummy" SQLite database. Continuing on the example from above:

package tutorial

func TestInspect(t *testing.T) {
// ... skipping driver creation
ctx := context.Background()
// Create an "example" table for Atlas to inspect.
_, err = db.ExecContext(ctx, "create table example ( id int not null );")
if err != nil {
log.Fatalf("failed creating example table: %s", err)
}
// Open an atlas driver.
driver, err := sqlite.Open(db)
if err != nil {
log.Fatalf("failed opening atlas driver: %s", err)
}
// Inspect the created table.
sch, err := driver.InspectSchema(ctx, "main", &schema.InspectOptions{
Tables: []string{"example"},
})
if err != nil {
log.Fatalf("failed inspecting schema: %s", err)
}
tbl, ok := sch.Table("example")
require.True(t, ok, "expected to find example table")
require.EqualValues(t, "example", tbl.Name)
id, ok := tbl.Column("id")
require.True(t, ok, "expected to find id column")
require.EqualValues(t, &schema.ColumnType{
Type: &schema.IntegerType{T: "int"}, // An integer type, specifically "int".
Null: false, // The column has NOT NULL set.
Raw: "INT", // The raw type inspected from the DB.
}, id.Type)
}

The full source-code for this example is available in the atlas-examples repo .

And voila! In this example, we first created a table named "example" by executing a query directly against the database. Next, we used the driver's InspectSchema method to inspect the schema of the table we created. Finally, we made some assertions on the returned schema.Table instance to verify that it was inspected correctly.

Inspecting using the CLI

If you don't want to write any code and just want to get a document representing your database schema, you can always use the Atlas CLI to do it for you. To get started, head over to the docs.

Wrapping up

In this post we presented the Go API of Atlas, which we initially built around our use case of building a new database migration tool, as part of the Operational Data Graph Platform that we are creating here at Ariga. As we mentioned in the beginning of this post, there are a lot of cool things you can build if you have proper database inspection, which raises the question, what will you build with it?

Getting involved with Atlas