Skip to main content

15 posts tagged with "migrations"

View All Tags

Bridging the gap between IaC and Schema Management

· 11 min read
Rotem Tamir
Building Atlas

Introduction

When we started building Atlas a couple of years ago, we noticed that there was a substantial gap between what was then considered state-of-the-art in managing database schemas and the recent strides from Infrastructure-as-Code (IaC) to managing cloud infrastructure.

In this post, we review that gap and show how Atlas – along with its Terraform provider – can bridge the two domains.

As an aside, I usually try to keep blog posts practical and to the point, but occasionally think it’s worth it to zoom out and explain the grander ideas behind what we do.

If you’re looking for a quick and practical explanation of working with Atlas and Terraform, I recommend this YouTube video.

Why Infrastructure-as-Code

Infrastructure as Code (IaC) refers to the practice of managing and provisioning infrastructure through machine-readable configuration files, instead of utilizing traditional interactive configuration tools. This approach makes for automated, consistent, and repeatable deployment of environments that are faster and less error-prone than previous, more manual approaches.

Terraform, a popular open-source tool created by HashiCorp, is the most prominent implementation of the IaC concept. With Terraform, organizations can describe the desired state of their infrastructure in a simple configuration language (HCL) and let Terraform plan and apply these changes in an automated way.

Terraform (and IaC in general) has taken the software engineering world by storm in recent years. As someone who had the dubious pleasure of managing complex cloud infrastructure manually, using what is today jokingly called "ClickOps", I can mention a few properties of IaC that I believe contributed to this success:

  • Declarative – Terraform is built on a declarative workflow, which means that users only define the final (desired) state of their system. Terraform is responsible for inspecting the target environment, calculating the difference between the current and desired states, and building a plan for reconciling between those two states.

    Cloud infrastructures are becoming increasingly complex, comprising thousands of different, interconnected components. Declarative workflows greatly reduce the mental overhead of planning changes to such environments.

  • Automated – Many engineers can attest that manually provisioning a new environment used to take days, even weeks! Once Terraform generates a plan for changing environments, the process runs automatically and finishes in a matter of minutes.

  • Holistic – With Terraform, it is possible to capture all of the resources and configurations required to provision an application as one interconnected and formally defined dependency graph. Deployments become truly reproducible and automated, with no dangling or manually provisioned dependencies.

  • Self-healing – Finally, these three properties converge to support a self-healing tool that can detect and fix drift on its own. Whenever drift occurs, it is only a matter of re-running Terraform to shift from the current state back to the desired one.

Comparing IaC with Schema Management Tools

Next, let’s discuss the current state of database schema management tools (often called schema migration tools) by contrasting them with the properties of IaC.

  • Imperative – If Terraform embodies the declarative approach, then schema management tools often exemplify the opposite, imperative (or revision-based) approach. In this case, we don’t provide the tools with the what (the desired state of the database), but the how (what SQL commands need to run to migrate the database from the previous version to the next).

  • Semi-automated – Migration tools were revolutionary when they came out a decade ago. One idea stood as one of the harbingers of the GitOps philosophy: that database changes should not be applied manually but first checked into source control and then applied automatically by a tool.

    Today’s migration tools automate two aspects of schema management: 1) execution and 2) tracking which migrations were already executed on a target database.

    Compared to modern IaC tools, however, they are fairly manual. In other words, they leave the responsibility of planning and verifying the safety of changes to the user.

  • Fragmented – As we described above, one of the most pleasant aspects of adopting the IaC mindset is having a unified, holistic description of your infrastructure, to the point where you can entirely provision it from a single terraform apply command.

    For database schema management, common practices are anything but holistic. In some cases, provisioning the schema might happen 1) when application servers boot, before starting the application, or 2) while it runs as an init container on Kubernetes.

    In fact, some places (yes, even established companies) still have developers manually connect (with root credentials) to the production database to execute schema changes!

  • A pain to fix – When a migration deployment fails, many schema management tools will actually get in your way. Instead of worrying about fixing the issue at hand, you now need to worry about both your database and the way your migration tool sees it (which have now diverged).

Bridging the Gap

After describing the gap between IaC and database schema management in more detail, let’s delve into what it would take to bridge it. Our goal is to have schema management become an integral part of your day-to-day IaC pipeline so that you can enjoy all the positive properties we described above.

To integrate schema change management and IaC, we would need to solve two things:

  1. A diffing engine capable of supporting declarative migration workflows, such that an engine should be capable of:
    • Loading the desired schema of the database in some form
    • Inspecting the current schema of the database
    • Calculating a safe migration plan automatically
  2. A Terraform Provider that wraps the engine as a Terraform resource, which can then seamlessly integrate into your overall application infrastructure configuration.

How Atlas drives Declarative Migrations

Atlas is a language-agnostic tool for managing and migrating database schemas using modern DevOps principles. It is different from Terraform in many ways, but similar enough to have received the informal nickname "Terraform for Databases".

At its core lie three capabilities that make it ideal to apply a declarative workflow to schema management:

  1. Schema loaders
  2. Schema inspection
  3. Diffing and planning

Let’s discuss each of these capabilities in more detail.

Schema loaders

Every declarative workflow begins with the desired state - what we want the system to look like. Using a mechanism called "schema loaders" Atlas users can provide the desired schema in many ways. For example:

Plain SQL

Atlas users can describe the desired schema of the database using plain SQL DDL statements such as:

CREATE TABLE users (
Id int primary key,
Name varchar(255)
)

Atlas HCL

Alternatively, users can use Atlas HCL, a configuration language that shares Terraform’s configuration language foundations:

table "users" {
schema = schema.public
column "id" {
type = int
}
column "name" {
type = varchar(255)
}
column "manager_id" {
type = int
}
primary_key {
columns = [
column.id
]
}
}

A live database

In addition, users can provide Atlas with a connection to an existing database which in turn Atlas can inspect and use as the desired state of the database.

External Schemas (ORM)

Finally, Atlas has an easily extensible design which makes writing plugins to load schemas from external sources a breeze. For example, Atlas can read the desired schema of the database directly from your ORM, using a simple integration.

Schema inspection

Once Atlas understands the desired state of the database, it needs to inspect the existing database to understand its current schema. This is done by connecting to the target database and querying the database’s information schema to construct a schema graph (an in-memory representation of all the components in the database and their connections).

Diffing and planning

The next phase involves calculating the difference ("diffing") between the desired and current states and calculating an execution plan to reconcile this difference. Because resources are often interconnected, Atlas must create a sensible order of execution using algorithms such as Topological Sort to ensure, for example, that dependencies on a resource are removed before it is dropped.

In addition, each database engine has its own peculiarities and limitations to take into account when creating an execution plan. For example, adding a default value to a column in an SQLite database must be performed in a multiple-step plan that looks similar to this:

-- Planned Changes:
-- Create "new_users" table
CREATE TABLE `new_users` (`id` int NOT NULL, `greeting` text NOT NULL DEFAULT 'shalom')
-- Copy rows from old table "users" to new temporary table "new_users"
INSERT INTO `new_users` (`id`, `greeting`) SELECT `id`, IFNULL(`greeting`, 'shalom') AS `greeting` FROM `users`
-- Drop "users" table after copying rows
DROP TABLE `users`
-- Rename temporary table "new_users" to "users"
ALTER TABLE `new_users` RENAME TO `users`

Atlas in action

What does this workflow look like in practice? As you can see in Atlas's "Getting Started" guide, suppose we made a change to our desired schema that adds a new table named blog_posts (this change may be described in a plain SQL file, an HCL file or even in your ORM's data model).

To apply the desired schema on a target database you would use the schema apply command:

atlas schema apply \
-u "mysql://root:pass@localhost:3306/example" \
--to file://schema.sql \
--dev-url "docker://mysql/8/example"

After which Atlas will generate a plan:

-- Planned Changes:
-- Create "blog_posts" table
CREATE TABLE `example`.`blog_posts` (`id` int NOT NULL, `title` varchar(100) NULL, `body` text NULL, `author_id` int NULL, PRIMARY KEY (`id`), INDEX `author_id` (`author_id`), CONSTRAINT `author_fk` FOREIGN KEY (`author_id`) REFERENCES `example`.`users` (`id`))
Use the arrow keys to navigate: ↓ ↑ → ←
? Are you sure?:
▸ Apply
Abort

Observing this example, you may begin to understand how Atlas earned its nickname the "Terraform for Databases."

Integrating with Terraform

The second piece of bridging the gap is to create a Terraform Provider that wraps Atlas and allows users to define resources that represent the schema definition as part of your infrastructure.

Ariga (the company behind Atlas) is an official HashiCorp Tech Partner that publishes the Atlas Terraform Provider, which was created to solve this problem precisely.

Using the Atlas Terraform Provider, users can finally provision their database instance and its schema in one holistic definition. For example, suppose we provision a MySQL database using AWS RDS:

// Our RDS-based MySQL 8 instance.
resource "aws_db_instance" "atlas-demo" {
identifier = "atlas-demo"
instance_class = "db.t3.micro"
engine = "mysql"
engine_version = "8.0.28"
// Some fields skipped for brevity
}

Next, we load the desired schema from an HCL file, using the Atlas Provider:

data "atlas_schema" "app" {
src = "file://${path.module}/schema.hcl"
}

Finally, we use the atlas_schemaresource to apply our schema to the database:

// Apply the normalized schema to the RDS-managed database.
resource "atlas_schema" "hello" {
hcl = data.atlas_schema.app.hcl
url = "mysql://${aws_db_instance.atlas-demo.username}:${urlencode(random_password.password.result)}@${aws_db_instance.atlas-demo.endpoint}/"
}

You can find a full example here.

When we run terraform apply, this is what will happen:

  • Terraform will provision the RDS database using the AWS Provider
  • Terraform will use Atlas to inspect the existing schema of the database and load the desired state from a local HCL file.
  • Atlas will calculate for Terraform a SQL plan to reconcile between the two.

And this is how it may look like in the Terraform plan:

Terraform will perform the following actions:

# atlas_schema.hello will be created
+ resource "atlas_schema" "hello" {
+ hcl = <<-EOT
table "posts" {
schema = schema.app
column "id" {
null = false
type = int
}
column "user_id" {
null = false
type = int
}
column "title" {
null = false
type = varchar(255)
}
column "body" {
null = false
type = text
}
primary_key {
columns = [column.id]
}
foreign_key "posts_ibfk_1" {
columns = [column.user_id]
ref_columns = [table.users.column.id]
on_update = NO_ACTION
on_delete = CASCADE
}
index "user_id" {
columns = [column.user_id]
}
}
table "users" {
schema = schema.app
column "id" {
null = false
type = int
}
column "user_name" {
null = false
type = varchar(255)
}
column "email" {
null = false
type = varchar(255)
}
primary_key {
columns = [column.id]
}
}
schema "app" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
}
EOT
+ id = (known after apply)
+ url = (sensitive value)
}

# aws_db_instance.atlas-demo will be created
+ resource "aws_db_instance" "atlas-demo" {
// .. redacted for brevity
+ }

And that's how you bridge the gap between IaC and schema management!

Conclusion

In this blog post, we reviewed some exceptional properties of Infrastructure-as-Code tools, such as Terraform, that have led to their widespread adoption and success in the industry. We then reviewed the current state of a similar problem, database schema management, in contrast to these properties. Finally, we showcased Atlas’s ability to adapt some IaC principles into the domain of schema management and how we can unify the two domains using the Atlas Terraform Provider.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

Versioned migrations on Kubernetes with the Atlas Operator

· 7 min read
Rotem Tamir
Building Atlas
TL;DR

You can now use the Atlas Kubernetes Operator to run versioned migrations natively from your Kubernetes cluster.

See an example

Introduction

The Atlas Kubernetes Operator is a Kubernetes operator that enables you to manage your database schemas natively from your Kubernetes cluster. By exposing custom resource definitions (CRD) the operator extends the Kubernetes API to support database schema management.

In a previous blog post we demonstrated how to use the Atlas Operator for the declarative (state-based) workflow in which you define the desired state of your database schema in a Kubernetes manifest and the operator takes care of the rest.

State vs. versioned based migrations is a common and unresolved debate in the database schema management world, and we built Atlas to support both from the get-go.

Today, we are happy to announce v0.2.0 of the Atlas Kubernetes Operator which adds support support for the versioned migration workflow.

In this blog post we will demonstrate how to use the Atlas Operator this new workflow.

How it works

The Atlas Kubernetes Operator supports versioned migrations. In versioned migrations, the database schema is defined by a series of SQL scripts ("migrations") that are applied in order. The user can specify the version and migration directory to run, which can be located on the Atlas Cloud or stored as a ConfigMap in your Kubernetes cluster.

In this workflow, after installing the Atlas Kubernetes Operator, the user defines the desired state of the database as an AtlasMigration resource which connects between a target database and a migration directory. The migration directory may be configured as a remote directory in Atlas Cloud or as a ConfigMap in your Kubernetes cluster.

The operator then reconciles the desired state with the actual state of the database by applying any pending migrations on the target database.

Demo time

In this demo we will use the Atlas Kubernetes Operator to run versioned migrations on a MySQL database.

Prerequisites

  1. A Kubernetes cluster - you can use Minikube to quickly spin up a local cluster.
  2. kubectl configured to connect to your cluster.
  3. Helm, the Kubernetes package manager, locally installed.
  4. The Atlas CLI tool, locally installed.

1. Install the Atlas Kubernetes Operator

The Atlas Kubernetes Operator is available as a Helm Chart. To install the chart with the release name atlas-operator:

helm install atlas-operator oci://ghcr.io/ariga/charts/atlas-operator

2. Install a database

Create a MySQL database and a secret with an Atlas URL to the database:

kubectl apply -f https://raw.githubusercontent.com/ariga/atlas-operator/65dce84761354d1766041c7f286b35cc24ffdddb/config/integration/databases/mysql.yaml

Result:

deployment.apps/mysql created
service/mysql created
secret/mysql-credentials created

In this example, we are using a plain MySQL pod as a database. In a real-world scenario, you would probably use a managed database service such as Amazon RDS or Google Cloud SQL.

3. Set up a migration directory

With the operator and the database running, let's set up the migration directory which we will use to manage our database schema.

You can use the directory from an existing project, but for the sake of this demo we will use the Atlas template repo which contains a simple migration directory.

git clone git@github.com:ariga/atlas-template.git versioned-demo

Observe this directory contains a migrations directory with a couple of migration scripts:

cd versioned-demo
tree migrations
tree migrations
migrations
├── 20230316085611.sql
├── 20230316090502.sql
└── atlas.sum

4. Create a ConfigMap with the migration directory

The operator supports two ways to manage your migration directory:

  • Atlas Cloud - a cloud-based directory that is managed by Atlas.
  • ConfigMap - a Kubernetes resource that contains the migration directory files as key-value pairs.

In this demo we will use a ConfigMap to manage our migration directory. To create a ConfigMap with the migration directory files:

kubectl create configmap migrations --from-file=migrations -o yaml --dry-run=client --save-config > migrations.yaml

The above command creates a YAML named migrations.yaml file with the migration directory files. It should look something like this:

apiVersion: v1
data:
20230316085611.sql: |
-- Create "users" table
CREATE TABLE `users` (
`id` int NOT NULL,
`user_name` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
20230316090502.sql: |
-- Create "posts" table
CREATE TABLE `posts` (
`id` int NOT NULL,
`user_id` int NOT NULL,
`title` varchar(255) NOT NULL,
`body` text NOT NULL,
PRIMARY KEY (`id`),
INDEX `user_id` (`user_id`),
CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
atlas.sum: |
h1:XBXbh+rzLis8gknjlIqnxXLBkOZ+sN2v2p7KjyVFYYM=
20230316085611.sql h1:br6W6LPEnnsejlz/7hRm9zthwStCzjN2vZkqVPxlmvo=
20230316090502.sql h1:GfeRjkSeoCt3JVRtLQNa/r50lRfpAPXS7AqTU2ZNFgY=
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"migrations","creationTimestamp":null},"data":{"20230316085611.sql":"-- Create \"users\" table\nCREATE TABLE `users` (\n `id` int NOT NULL,\n `user_name` varchar(255) NOT NULL,\n `email` varchar(255) NOT NULL,\n PRIMARY KEY (`id`)\n) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;\n","20230316090502.sql":"-- Create \"posts\" table\nCREATE TABLE `posts` (\n `id` int NOT NULL,\n `user_id` int NOT NULL,\n `title` varchar(255) NOT NULL,\n `body` text NOT NULL,\n PRIMARY KEY (`id`),\n INDEX `user_id` (`user_id`),\n CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE\n) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;\n","atlas.sum":"h1:XBXbh+rzLis8gknjlIqnxXLBkOZ+sN2v2p7KjyVFYYM=\n20230316085611.sql h1:br6W6LPEnnsejlz/7hRm9zthwStCzjN2vZkqVPxlmvo=\n20230316090502.sql h1:GfeRjkSeoCt3JVRtLQNa/r50lRfpAPXS7AqTU2ZNFgY=\n"}}
name: migrations

Apply the ConfigMap to your cluster:

kubectl apply -f migrations.yaml

Kubernetes will create a ConfigMap named migrations with the migration directory files:

configmap/migrations created

5. Create an AtlasMigration resource

Now that we have a database and a migration directory, we can create an AtlasMigration resource to manage our database schema. The AtlasMigration resource is a custom resource that you use to define the desired state of your database schema. The operator will then reconcile the actual state of your database schema with the desired state.

To create an AtlasMigration resource, create a YAML file named atlas-migration.yaml with the following content:

apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasMigration
metadata:
name: atlas-migration
spec:
urlFrom:
secretKeyRef:
key: url
name: mysql-credentials
dir:
configMapRef:
name: "migrations"

After you create the atlas-migration.yaml file, apply it to your cluster:

kubectl apply -f atlas-migration.yaml

Next, let's wait for the resource to enter a "Ready" state:

kubectl wait --for=condition=Ready atlasmigration/atlas-migration

When the operator finishes reconciling the AtlasMigration resource, the AtlasMigration resource will be ready:

atlasmigration.db.atlasgo.io/atlas-migration condition met

6. Verify the migrations were applied

Finally, to verify the migrations were applied, connect to the database and check to see if the users table was created:

kubectl exec -it $(kubectl get pods -l app=mysql -o jsonpath='{.items[0].metadata.name}') -- mysql -uroot -ppass -e "describe myapp.users"

You should see the following output:

+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id | int | NO | PRI | NULL | |
| user_name | varchar(255) | NO | | NULL | |
| email | varchar(255) | NO | | NULL | |
+-----------+--------------+------+-----+---------+-------+

Check that the posts table was created as well:

kubectl exec -it $(kubectl get pods -l app=mysql -o jsonpath='{.items[0].metadata.name}') -- mysql -uroot -ppass -e "describe myapp.posts"

You should see the following output:

+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| id | int | NO | PRI | NULL | |
| user_id | int | NO | MUL | NULL | |
| title | varchar(255) | NO | | NULL | |
| body | text | NO | | NULL | |
+---------+--------------+------+-----+---------+-------+

And that's it! You've successfully deployed the Atlas Operator and applied migrations to your database.

Conclusion

In this blog post, we showed you how to use the Atlas Operator to manage your database schema in Kubernetes using a versioned migrations workflow. To learn more about the Atlas Operator, check out the Atlas Operator GitHub repository as well as the documentation on the Atlas website.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

Migrate Multi-Tenant Environments With Atlas

· 8 min read
Ariel Mashraki
Building Atlas

Wikipedia defines Multi-tenancy as:

a software architecture in which a single instance of software runs on a server and serves multiple tenants.

In recent years, multitenancy has become a common topic in our industry as many organizations provide service to multiple customers using the same infrastructure. Multitenancy usually becomes an issue in software architecture because tenants often expect a decent level of isolation from one another.

In this post, I will go over different known approaches for achieving multi-tenancy and discuss the approach we took to build Ariga's cloud platform. In addition, I will demonstrate how we added built-in support for multi-tenant environments in Atlas to overcome some of the challenges we faced.

Introduction

Throughout the last few years, I have had the opportunity to implement multi-tenancy in various ways. Some of them might be familiar to you:

  1. A separate environment (deployment) per tenant, where isolation is achieved at both compute and data layers.
  2. A schema (named database) per tenant, where there is one environment for compute (e.g., a K8S cluster), but tenants are stored in different databases or schemas. Isolation is achieved at the data layer while compute resources are shared.
  3. One environment for all tenants, including the data layer. Typically, in this case, each table holds a tenant_id column that is used to filter statements by the tenant. Both data and compute layers are shared, with isolation achieved at the logical, database query level.

Each approach has pros and cons, but I want to briefly list the main reasons we chose to build our cloud platform based on the second option: schema per tenant.

  1. Management: Easily delete, backup tenants, and allow them to export their data without affecting others.
  2. Isolation: Limit credentials, connection pooling, and quotas per tenant. This way, one tenant cannot cause the database to choke and interrupt other tenants in case they share the same physical database.
  3. Security and data privacy: In case it is required, some tenants can be physically separated from others. For example, data can be stored in the tenant's AWS account, and the application can connect to it using a secure connection, like VPC peering in AWS.
  4. Code-maintenance: Most of the application code is written in a way that it is unaware of the multi-tenancy. In our case, there is one layer "at the top" that attaches the tenant connection to the context, and the API layer (e.g., GraphQL resolver) extracts the connection from the context to read/write data. As a result, we are not concerned that API changes will cross tenant boundaries.
  5. Migration: Schema changes can be executed first on "test tenants" and fail-fast in case of error.

The primary con to this approach was that there was no elegant way to execute migrations on multiple databases (N times) in Atlas. In the rest of the post, I'll cover how we solved this problem in Ariga and added built-in support for multi-tenancy in Atlas.

Atlas config file

Atlas provides a convenient way to describe and interact with multiple environments using project files. A project file is a file named atlas.hcl and contains one or more env blocks. For example:

atlas.hcl
env "local" {
url = "mysql://root:pass@:3306/"
migrations {
dir = "file://migrations"
}
}

env "prod" {
// ... a different env
}

Once defined, a project's environment can be worked against using the --env flag. For example:

atlas migrate apply --env local

The command above runs the schema apply against the database that is defined in the local environment.

Multi-Tenant environments

The Atlas configuration language provides a few capabilities adopted from Terraform to facilitate the definition of multi-tenant environments. The first is the for_each meta-argument that allows defining a single env block that is expanded to N instances, one for each tenant. For example:

atlas.hcl
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}

variable "tenants" {
type = list(string)
}

env "local" {
for_each = toset(var.tenants)
url = urlsetpath(var.url, each.value)
migration {
dir = "file://migrations"
}
}

The above configuration expects a list of tenants to be provided as a variable. This can be useful when the list of tenants is dynamic and can be injected into the Atlas command. The urlsetpath function is a helper function that sets the path of the database URL to the tenant name. For example, if url is set to mysql://root:pass@:3306/?param=value and the tenant name is tenant1, the resulting URL will be mysql://root:pass@:3306/tenant1?param=value.

The second capability is Data Sources. This option enables users to retrieve information stored in an external service or database. For the sake of this example, let's extend the configuration above to use the SQL data source to retrieve the list of tenants from the INFORMATION_SCHEMA in MySQL:

atlas.hcl
// The URL of the database we operate on.
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}

// Schemas that match this pattern will be considered tenants.
variable "pattern" {
type = string
default = "tenant_%"
}

data "sql" "tenants" {
url = var.url
query = <<EOS
SELECT `schema_name`
FROM `information_schema`.`schemata`
WHERE `schema_name` LIKE ?
EOS
args = [var.pattern]
}

env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
}

Example

Let's demonstrate how managing migrations in a multi-tenant architecture is made simple with Atlas.

1. Install Atlas

To download and install the latest release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | sh

2. Create a migration directory with the following example content:

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

3. Create two example tenants on a local database:

create database tenant_a8m;
create database tenant_rotemtam;

4. Run Atlas to execute the migration scripts on the tenants' databases:

atlas migrate apply --env local
tenant_a8m
Migrating to version 20220811074314 (2 migrations in total):

-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (36.803179ms)

-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (26.184177ms)

-------------------------
-- 72.899146ms
-- 2 migrations
-- 2 sql statements
tenant_rotemtam
Migrating to version 20220811074314 (2 migrations in total):

-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (61.987153ms)

-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (24.656515ms)

-------------------------
-- 95.233384ms
-- 2 migrations
-- 2 sql statements

Running the command again will not execute any migrations:

No migration files to execute
No migration files to execute

Migration logging

At Ariga, our services print structured logs (JSON) to feed our observability tools. That is why we felt obligated to add support for custom log formatting in Atlas. To continue the example from above, we present how we configure Atlas to emit JSON lines with the tenant name attached to them.

1. Add the log configuration to the local environment block:

atlas.hcl
env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
// Emit JSON logs to stdout and add the
// tenant name to each log line.
format {
migrate {
apply = format(
"{{ json . | json_merge %q }}",
jsonencode({
Tenant : each.value
})
)
}
}
}

2. Create a new script file in the migration directory:

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

3. Run migrate apply in our "local" environment:

atlas migrate apply --env local
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.685899+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.655879+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.685899+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.647091+03:00","Target":"20221027125605","Tenant":"tenant_a8m","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_a8m","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_a8m","Scheme":"mysql","User":{}}}
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.787476+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.757463+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.787476+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.748399+03:00","Target":"20221027125605","Tenant":"tenant_rotemtam","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_rotemtam","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_rotemtam","Scheme":"mysql","User":{}}}

Next steps

Currently, Atlas uses a fail-fast policy, which means the process exits on the first tenant that returns an error. We built it this way because we find it helpful to execute migrations first on "test tenants" and stop in case the operation fails on any of them. However, this means the execution is serial and may be slow in cases where there is a large amount of tenants. Therefore, we aim to add more advanced approaches that will allow executing the first M tenants serially and the rest of the N-M tenants in parallel.

Have questions? Feedback? Feel free to reach out on our Discord server.

The Atlas Migration Execution Engine

· 8 min read
Jannik Clausen
Building Atlas

With the release of v0.6.0, we introduced a workflow for managing changes to database schemas that we have called: Versioned Migration Authoring.

Today, we released the first version of the Atlas migration execution engine, that can apply migration files on your database. In this post, we will give a brief overview of the features and what to expect in the future.

Migration File Format

The Atlas migration filename format follows a very simple structure: version_[name].sql, with the name being optional. version can be an arbitrary string. Migration files are lexicographically sorted by filename.

↪ tree .
.
├── 1_initial.sql
├── 2_second.sql
├── 3_third.sql
└── atlas.sum

0 directories, 4 files

If you want to follow along, you can simply copy and paste the above files in a folder on your system. Make sure you have a database ready to work on. You can start an ephemeral docker container with the following command:

# Run a local mysql container listening on port 3306.
docker run --rm --name atlas-apply --detach --env MYSQL_ROOT_PASSWORD=pass -p 3306:3306 mysql:8

Apply Migrations

In order to apply migrations you need to have the Atlas CLI in version v0.7.0 or above. Follow the installation instructions if you don't have Atlas installed yet.

Now, to apply the first migration of our migration directory, we call atlas migrate apply and pass in some configuration parameters.

atlas migrate apply 1 \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migrating to version 1 (1 migrations in total):

-- migrating version 1
-> CREATE DATABASE `my_schema`;
-> CREATE TABLE `my_schema`.`tbl` (`col` int NOT NULL);
-- ok (17.247319ms)

-------------------------
-- 18.784204ms
-- 1 migrations
-- 2 sql statements

Migration Status

Atlas saves information about the database schema revisions (applied migration versions) in a special table called atlas_schema_revisions. In the example above we connected to the database without specifying which schema to operate against. For this reason, Atlas created the revision table in a new schema called atlas_schema_revisions. For a schema-bound connection Atlas will put the table into the connected schema. We will see that in a bit.

Go ahead and call atlas migrate status to gather information about the database migration state:

atlas migrate status \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migration Status: PENDING
-- Current Version: 1
-- Next Version: 2
-- Executed Files: 1
-- Pending Files: 2

This output tells us that the last applied version is 1, the next one is called 2 and that we still have two migrations pending. Let's apply the pending migrations:

Note, that we do not pass an argument to the apply, in which case Atlas will attempt to apply all pending migrations.

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migrating to version 3 from 1 (2 migrations in total):

-- migrating version 2
-> ALTER TABLE `my_schema`.`tbl` ADD `col_2` TEXT;
-- ok (13.98847ms)

-- migrating version 3
-> CREATE TABLE `tbl_2` (`col` int NOT NULL);
Error 1046: No database selected

-------------------------
-- 15.604338ms
-- 1 migrations ok (1 with errors)
-- 1 sql statements ok (1 with errors)

Error: Execution had errors: Error 1046: No database selected

Error: sql/migrate: executing statement "CREATE TABLE `tbl_2` (`col` int NOT NULL);" from version "3": Error 1046: No database selected

What happened here? After further investigation, you will find that our connection URL is bound to the entire database, not to a schema. The third migration file however does not contain a schema qualifier for the CREATE TABLE statement.

By default, Atlas wraps the execution of each migration file into one transaction. This transaction gets rolled back if any error occurs withing execution. Be aware though, that some databases, such as MySQL and MariaDB, don't support transactional DDL. If you want to learn how to configure the way Atlas uses transactions, have a look at the docs.

Migration Retry

To resolve this edit the migration file and add a qualifier to the statement:

CREATE TABLE `my_schema`.`tbl_2` (`col` int NOT NULL);

Since you changed the contents of a migration file, we have to re-calculate the directory integrity hash-sum by calling:

atlas migrate hash --force \
--dir "file://migrations"

Then we can proceed and simply attempt to execute the migration file again.

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
Migrating to version 3 from 2 (1 migrations in total):

-- migrating version 3
-> CREATE TABLE `my_schema`.`tbl_2` (`col` int NOT NULL);
-- ok (15.168892ms)

-------------------------
-- 16.741173ms
-- 1 migrations
-- 1 sql statements

Attempting to migrate again or calling atlas migrate status will tell us that all migrations have been applied onto the database and there is nothing to do at the moment.

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/"
No migration files to execute

Moving an existing project to Atlas with Baseline Migrations

Another common scenario is when you need to move an existing project to Atlas. To do so, create an initial migration file reflecting the current state of a database schema by using atlas migrate diff. A very simple way to do so would be by heading over to the database from before, deleting the atlas_schema_revisions schema, emptying your migration directory and running the atlas migrate diff command.

rm -rf migrations
docker exec atlas-apply mysql -ppass -e "CREATE SCHEMA `my_schema_dev`;" # create a dev-db
docker exec atlas-apply mysql -ppass -e "DROP SCHEMA `atlas_schema_revisions`;"
atlas migrate diff \
--dir "file://migrations" \
--to "mysql://root:pass@localhost:3306/my_schema" \
--dev-url "mysql://root:pass@localhost:3306/my_schema_dev"

To demonstrate that Atlas can also work on a schema level instead of a realm connection, we are running on a connection bound to the my_schema schema this time.

You should end up with the following migration directory:

-- create "tbl" table
CREATE TABLE `tbl` (`col` int NOT NULL, `col_2` text NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- create "tbl_2" table
CREATE TABLE `tbl_2` (`col` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Now, let's create a new migration file to create a table tbl_3 and update the directory integrity file.

atlas migrate new add_table --dir "file://migrations"
echo "CREATE TABLE `tbl_3` (`col` text NULL);" >> migrations/$(ls -t migrations | head -n1)
atlas migrate hash --force --dir "file://migrations"

Since we now have both a migration file representing our current database state and the new migration file to apply, we can make use of the --baseline flag:

atlas migrate apply \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/my_schema" \
--baseline "20220908110527" # replace the version with the one generated by you
Migrating to version 20220908110847 from 20220908110527 (1 migrations in total):

-- migrating version 20220908110847
-> CREATE TABLE `tbl_3` (`col` text NULL);
-- ok (14.325493ms)

-------------------------
-- 15.786455ms
-- 1 migrations
-- 1 sql statements

Outlook

The Atlas migration engine is powering Ent and the execution engine is already being used within Ariga for several months. We will continue working on improving it, releasing cool features, such as assisted troubleshooting for failed migrations, a more intelligent, dialect-aware execution planning for things like MySQLs implicits commits and more.

Wrapping up

In this post we learned about the new migration execution engine of Atlas and some information about its internals.

Further reading

To learn more about Versioned Migration Authoring:

Have questions? Feedback? Find our team on our Discord server.

Announcing v0.6.0 with Versioned Migration Authoring

· 8 min read
Ariel Mashraki
Building Atlas

With the release of v0.6.0, we are happy to announce official support for a style of workflow for managing changes to database schemas that we have been experimenting with in the past months: Versioned Migration Authoring.

TL;DR

  • Atlas supports a declarative workflow (similar to Terraform) where users provide the desired database schema in a simple data definition language and Atlas calculates a plan to get a target database to that state. This workflow is supported by the schema apply command.
  • Many teams prefer a more imperative approach where each change to the database schema is checked-in to source control and reviewed during code-review. This type of workflow is commonly called versioned migrations (or change based migrations) and is supported by many established tools such as Flyway and Liquibase.
  • The downside of the versioned migration approach is, of course, that it puts the burden of planning the migration on developers. As part of the Atlas project we advocate for a third combined approach that we call "Versioned Migration Authoring".
  • Versioned Migration Authoring is an attempt to combine the simplicity and expressiveness of the declarative approach with the control and explicitness of versioned migrations.
  • To use Versioned Migration Authoring today, use the atlas migrate diff command. See the Getting Started section below for instructions.

Declarative Migrations

The declarative approach has become increasingly popular with engineers nowadays because it embodies a convenient separation of concerns between application and infrastructure engineers. Application engineers describe what (the desired state) they need to happen, and infrastructure engineers build tools that plan and execute ways to get to that state (how). This division of labor allows for great efficiencies as it abstracts away the complicated inner workings of infrastructure behind a simple, easy to understand API for the application developers and allows for specialization and development of expertise to pay off for the infra people.

With declarative migrations, the desired state of the database schema is given as input to the migration engine, which plans and executes a set of actions to change the database to its desired state.

For example, suppose your application uses a small SQLite database to store its data. In this database, you have a users table with this structure:

schema "main" {}

table "users" {
schema = schema.main
column "id" {
type = int
}
column "greeting" {
type = text
}
}

Now, suppose that you want to add a default value of "shalom" to the greeting column. Many developers are not aware that it isn't possible to modify a column's default value in an existing table in SQLite. Instead, the common practice is to create a new table, copy the existing rows into the new table and drop the old one after. Using the declarative approach, developers can change the default value for the greeting column:

schema "main" {}

table "users" {
schema = schema.main
column "id" {
type = int
}
column "greeting" {
type = text
default = "shalom"
}
}

And have Atlas's engine devise a plan similar to this:

-- Planned Changes:
-- Create "new_users" table
CREATE TABLE `new_users` (`id` int NOT NULL, `greeting` text NOT NULL DEFAULT 'shalom')
-- Copy rows from old table "users" to new temporary table "new_users"
INSERT INTO `new_users` (`id`, `greeting`) SELECT `id`, IFNULL(`greeting`, 'shalom') AS `greeting` FROM `users`
-- Drop "users" table after copying rows
DROP TABLE `users`
-- Rename temporary table "new_users" to "users"
ALTER TABLE `new_users` RENAME TO `users`

Versioned Migrations

As the database is one of the most critical components in any system, applying changes to its schema is rightfully considered a dangerous operation. For this reason, many teams prefer a more imperative approach where each change to the database schema is checked-in to source control and reviewed during code-review. Each such change is called a "migration", as it migrates the database schema from the previous version to the next. To support this kind of requirement, many popular database schema management tools such as Flyway, Liquibase or golang-migrate support a workflow that is commonly called "versioned migrations".

In addition to the higher level of control which is provided by versioned migrations, applications are often deployed to multiple remote environments at once. These environments are not controlled (or even accessible) by the development team. In such cases, declarative migrations, which rely on a network connection to the target database and on human approval of migrations plans in real-time, are not a feasible strategy.

With versioned migrations (sometimes called "change-based migrations"), instead of describing the desired state ("what the database should look like"), developers describe the changes themselves ("how to reach the state"). Most of the time, this is done by creating a set of SQL files containing the statements needed. Each of the files is assigned a unique version and a description of the changes. Tools like the ones mentioned earlier are then able to interpret the migration files and to apply (some of) them in the correct order to transition to the desired database structure.

The benefit of the versioned migrations approach is that it is explicit: engineers know exactly what queries are going to be run against the database when the time comes to execute them. Because changes are planned ahead of time, migration authors can control precisely how to reach the desired schema. If we consider a migration as a plan to get from state A to state B, oftentimes multiple paths exist, each with a very different impact on the database. To demonstrate, consider an initial state which contains a table with two columns:

CREATE TABLE users (
id int,
name varchar(255)
);

Suppose our desired state is:

CREATE TABLE users (
id int,
user_name varchar(255)
);

There are at least two ways get from the initial to the desired state:

  • Drop the name column and create a new user_name column.
  • Alter the name of the name column to user_name.

Depending on the context, either may be the desired outcome for the developer planning the change. With versioned migrations, engineers have the ultimate confidence of what change is going to happen, which may not be known ahead of time in a declarative approach.

Migration Authoring

The downside of the versioned migration approach is, of course, that it puts the burden of planning the migration on developers. This requires a certain level of expertise that is not always available to every engineer, as we demonstrated in our example of setting a default value in a SQLite database above.

As part of the Atlas project we advocate for a third combined approach that we call "Versioned Migration Authoring". Versioned Migration Authoring is an attempt to combine the simplicity and expressiveness of the declarative approach with the control and explicitness of versioned migrations.

With versioned migration authoring, users still declare their desired state and use the Atlas engine to plan a safe migration from the existing to the new state. However, instead of coupling planning and execution, plans are instead written into normal migration files which can be checked-in to source control, fine-tuned manually and reviewed in regular code review processes.

Getting started

Start by downloading the Atlas CLI:

To download and install the latest release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | sh

Next, define a simple Atlas schema with one table and an empty migration directory:

schema.hcl
schema "test" {}

table "users" {
schema = schema.test
column "id" {
type = int
}
}

Let's run atlas migrate diff with the necessary parameters to generate a migration script for creating our users table:

  • --dir the URL to the migration directory, by default it is file://migrations.
  • --to the URL of the desired state, an HCL file or a database connection.
  • --dev-url a URL to a Dev Database that will be used to compute the diff.
atlas migrate diff create_users \
--dir="file://migrations" \
--to="file://schema.hcl" \
--dev-url="mysql://root:pass@:3306/test"

Observe that two files were created in the migrations directory:

By default, migration files are named with the following format {{ now }}_{{ name }}.sql. If you wish to use a different file format, use the --dir-format option.

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Further reading

To learn more about Versioned Migration Authoring:

Have questions? Feedback? Find our team on our Discord server.