Skip to main content

Sync Schema Between Application and Data Lake

Question

How can I keep a single canonical schema or model and apply it to multiple database drivers (for example, an application Postgres database and a data-lake store) so the two remain in sync?

Answer

Maintaining one canonical model for multiple drivers is a robust way to ensure schema parity between your application and downstream stores, such as a data lake.

Some of the benefits of having a single source of truth include:

  • Avoiding drift between application and analytical stores
  • Simplifying migrations, auditing, and testing
  • Enabling deterministic schema generation for environments, test fixtures, and data pipelines

The recommended (and practical) approach for this with Atlas is to:

  1. Define your model using your application's ORM (e.g., SQLAlchemy, GORM, etc.) to create a single source of truth
  2. Download the corresponding Atlas ORM provider so Atlas can read your model
  3. Configure your atlas.hcl file to use the ORM model (a data.external_schema data source) for src and to point migration.dir and url to each target driver.
  4. Run atlas migrate diff for each target to produce migration files tailored to each driver (see "Generating Migrations").

Example: Using SQLAlchemy

If your application defines the model with SQLAlchemy, use the Atlas SQLAlchemy provider to load the models and generate migrations for each target.

Below is an example atlas.hcl file for a multi-dialect setup using an external SQLAlchemy provider program and per-dialect migration directories.

atlas.hcl
variable "dialect" {
type = string
}

locals {
dev_url = {
mysql = "docker://mysql/8/dev"
postgresql = "docker://postgres/15"
sqlite = "sqlite://?mode=memory&_fk=1"
mssql = "docker://sqlserver/2022-latest"
clickhouse = "docker://clickhouse/23.11/dev"
}[var.dialect]
}

data "external_schema" "sqlalchemy" {
program = [
"atlas-provider-sqlalchemy",
"--path", "app",
"--dialect", var.dialect,
]
}

env "sqlalchemy" {
src = data.external_schema.sqlalchemy.url
dev = local.dev_url
migration {
dir = "file://migrations/${var.dialect}"
}
}
note

Make sure atlas-provider-sqlalchemy is installed and accessible in PATH (see the SQLAlchemy guide for provider setup and installation). The --path argument should point to the Python package/module where your SQLAlchemy models live (e.g., app or app.models).

Run atlas migrate diff --env sqlalchemy --var dialect=postgresql (or set the --var dialect=... for other engines) to generate a migration tailored to the selected dialect and stored under migrations/<dialect>/. See Generating Migrations for more.

Review and commit the generated migration files (including atlas.sum), then apply them to the appropriate targets (for example with atlas migrate apply --env sqlalchemy --var "dialect=postgresql" --url "$APP_DATABASE_URL").

Example project: https://github.com/ariga/atlas-showcase/tree/master/projects/sqlalchemy