Sync Schema Between Application and Data Lake
Question
How can I keep a single canonical schema or model and apply it to multiple database drivers (for example, an application Postgres database and a data-lake store) so the two remain in sync?
Answer
Maintaining one canonical model for multiple drivers is a robust way to ensure schema parity between your application and downstream stores, such as a data lake.
Some of the benefits of having a single source of truth include:
- Avoiding drift between application and analytical stores
- Simplifying migrations, auditing, and testing
- Enabling deterministic schema generation for environments, test fixtures, and data pipelines
The recommended (and practical) approach for this with Atlas is to:
- Define your model using your application's ORM (e.g., SQLAlchemy, GORM, etc.) to create a single source of truth
- Download the corresponding Atlas ORM provider so Atlas can read your model
- Configure your
atlas.hcl
file to use the ORM model (adata.external_schema
data source) forsrc
and to pointmigration.dir
andurl
to each target driver. - Run
atlas migrate diff
for each target to produce migration files tailored to each driver (see "Generating Migrations").
Example: Using SQLAlchemy
If your application defines the model with SQLAlchemy, use the Atlas SQLAlchemy provider to load the models and generate migrations for each target.
Below is an example atlas.hcl
file for a multi-dialect setup using an external SQLAlchemy provider program and per-dialect migration directories.
variable "dialect" {
type = string
}
locals {
dev_url = {
mysql = "docker://mysql/8/dev"
postgresql = "docker://postgres/15"
sqlite = "sqlite://?mode=memory&_fk=1"
mssql = "docker://sqlserver/2022-latest"
clickhouse = "docker://clickhouse/23.11/dev"
}[var.dialect]
}
data "external_schema" "sqlalchemy" {
program = [
"atlas-provider-sqlalchemy",
"--path", "app",
"--dialect", var.dialect,
]
}
env "sqlalchemy" {
src = data.external_schema.sqlalchemy.url
dev = local.dev_url
migration {
dir = "file://migrations/${var.dialect}"
}
}
Make sure atlas-provider-sqlalchemy
is installed and accessible in PATH (see the SQLAlchemy guide for provider setup and installation). The --path
argument should point to the Python package/module where your SQLAlchemy models live (e.g., app
or app.models
).
Run atlas migrate diff --env sqlalchemy --var dialect=postgresql
(or set the --var dialect=...
for other engines) to generate a migration tailored to the selected dialect and stored under migrations/<dialect>/
. See Generating Migrations for more.
Review and commit the generated migration files (including atlas.sum
), then apply them to the appropriate targets (for example with atlas migrate apply --env sqlalchemy --var "dialect=postgresql" --url "$APP_DATABASE_URL"
).
Example project: https://github.com/ariga/atlas-showcase/tree/master/projects/sqlalchemy