Skip to main content

Run Jobs for production

After running migrations, you're ready to import and transform your data.

What Is an Execution?

In Datoria, each execution is the process of importing one partition for a specific job.
Because jobs can have multiple partitions, you may end up with many executions running in parallel when you perform a single Datoria run.

Plan

Before you kick off a transformation run, you can list all job executions that are actually scheduled to run by using the --plan flag:

datoria run --plan

Behind the scenes, Datoria syncs partition metadata from your data warehouse.
This can take a few seconds initially, but after that, you’ll see exactly which partitions (and therefore executions) need refreshing—almost instantly.

Run

When you’re confident everything looks good (or you just want to go for it), run:

datoria run

Datoria will use its synchronized partition metadata to import or transform only what’s required.
If you run datoria run again soon afterward, and nothing has changed, your second run will finish almost instantly—incurring zero extra cloud costs.

Why is it so fast?

Partition Metadata & Deep Invalidation

Datoria leverages partitioning in BigQuery (or other compatible data stores), along with metadata—most critically, a lastUpdated timestamp—to know exactly which partitions need re-importing.
This “deep invalidation” means only out-of-date data is refreshed instead of reprocessing everything.

Partition-Aware Transformations

All transformation queries are partition-aware.
If a single partition is outdated, Datoria re-imports just that partition.
Everything else, still up to date, is skipped.
That keeps your runtime and compute bills minimized.

Parallel Executions for Speed

Because each execution is an import for a single partition, Datoria can parallelize these executions as much as you need.
Rather than processing partitions sequentially (which can be very slow), Datoria’s concurrency model allows you to scale horizontally and process multiple partitions at once, leading to:

  • Dramatic runtime reductions
  • Efficient utilization of available resources
  • A formidable speedup over traditional sequential tools

Instant Dependency Resolution

By combining partition metadata, partition lineage, and our dependency/invalidation model, Datoria can quickly determine which partitions to (re-)import. After the initial metadata sync, it knows which parts of your warehouse are up to date.

  1. On the first run, it identifies all outdated partitions and processes them.
  2. On subsequent runs, if nothing has changed, it effectively does nothing—completing almost instantly and incurring no extra cloud costs.

The result?
A system that feels magical for anyone used to hours-long data pipeline jobs.


In the next sections—and in some accompanying videos—we’ll show you real-world examples of how partition-lineage works in action, and how you can tune concurrency to speed up large-scale transformations.
Once you’ve experienced partition-based parallelism, you’ll never want to go back to monolithic, sequential tools.