Skip to main content

Environments: Running Datoria in Staging

A staging environment is a full-fledged execution space—parallel to production—where transformations run separately, source data can be different, and results are stored independently.

This allows teams to:

  • Validate changes at scale before rolling them out to production.
  • Integrate with staging data sources instead of reading from production.
  • Verify entire workflows, including downstream consumers and BI tooling.

Datoria provides first-class support for these environments via a simple mechanism: rewriting dataset coordinates while running migrations and transformations.


Deploying to a Staging Environment

Datoria’s migrations update table definitions, views, and routines in the specified dataset. To migrate changes to staging, simply run:

datoria migrate --dataset projectid.staging_dataset

This ensures that:

All managed tables, views, and routines are created in projectid.staging_dataset
Nothing is modified in production

Once the staging environment is up to date, you can run transformations against it.


Running Transformations in a Staging Environment

To execute transformations in the staging environment, use:

datoria run --dataset projectid.staging_dataset

This ensures that:

All transformations write results to projectid.staging_dataset
Partition dependencies and invalidation rules are followed as usual

For example, if production tables are in projectid.analytics_prod, running with --dataset projectid.analytics_staging means:

  • Any newly computed data will go to projectid.analytics_staging.
  • Any source tables not managed by Datoria will still be read from production (for now).

Staging Environments Still Read from Production Sources (For Now)

Datoria’s rewrite mechanism currently leaves source tables unchanged. This means:

  • If a table is not managed by Datoria (i.e., it's an external source table), it will still be read from production.
  • Only transformed data is redirected to the specified --dataset.

For example, if you run:

datoria run --dataset projectid.analytics_staging

Then:

Managed tables are rewritten to use projectid.analytics_staging
🚧 Source tables still point to production (for now!)

We are actively working on improving this so that environments can be defined in source. In the future, instead of rewriting source table names, you will be able to specify environment-specific coordinates directly in the table definition—allowing full staging environments with separate upstream data.


When to Use a Staging Environment

Use CaseSandboxStaging Environment
Testing one or a few jobs in isolation
Ensuring real production data is used for upstream sources
Running all transformations as a parallel pipeline
Running entire BI dashboards / downstream processes
Long-lived staging environments for integration testing

A staging environment makes sense when you want a longer-running, parallel execution environment for verifying end-to-end workflows—not just testing individual jobs.

Sandboxes are much more lightweight and focused on fast iteration with minimal setup.


Next Steps

  • If you’re working in sandboxes, check out Sandboxes.
  • If you're ready to deploy your changes, continue to Migrations.