Sandbox: Run transformations in isolation
Developing data transformations shouldn't feel like working without a safety net. One wrong move in a shared environment can create chaos. With Datoria's Sandboxes, you get a fully isolated, temporary environment where you can materialize and inspect data—without risk, without clutter, and without disrupting shared workflows.
Why Use Sandbox?
Traditional dev environments are a headache. They're messy, shared, and unpredictable. Keeping test data up to date is nearly impossible, and one misstep can break something for the whole team.
Sandboxes change the game:
- Your modified tables and jobs are fully recreated in an isolated environment.
- Upstream dependencies remain untouched and are read from a real environment—typically production, unless you explicitly choose a different source.
- Everything downstream of your changes—including dependent tables, views, and routines—is also recreated in the temporary environment, ensuring a consistent, testable flow.
This lets you work with real data in a controlled setting, ensuring accuracy while staying completely safe.
How It Works
- Detects Your Changes: The Sandbox first calculates a diff between your branch and the main/master branch, identifying all modified transformations and dependencies.
- Creates an Isolated Execution Environment: A temporary database schema (e.g., a dataset in BigQuery) is spun up, named based on your user and branch. You can even set how long it lives before auto-cleanup.
- Rewrites the Graph for Isolation: Every modified table and job is recreated in the temporary environment, while anything upstream remains read from the real environment. Any downstream dependencies—tables, views, and routines—are also recreated to ensure consistency.
- Applies Necessary Migrations: Any new or modified routines, views, and tables that interact with your changes are automatically recreated in the isolated environment.
- Runs Your Transformations: You decide which partitions to execute—Datoria ensures dependencies are met, just like in production.
- Downsamples Reads (Optional): A
LIMIT
clause can be applied to every table read within a transformation. This slashes costs dramatically when testing multi-step transformations. The deeper the dependency chain, the more you save—not just on writes, but on expensive reads too. - Share & Validate: Once materialized, results are available for review. Share them with colleagues or explore them in the UI.
- Automatic Cleanup: No need to worry about cleanup—your temporary environment expires based on retention settings you control.
Running sandboxes
Just run:
datoria sandbox <jobname>
This launches the graphical interface, where you can:
- Select exactly which transformations to run
- Set partition execution options
- Enable or disable downsampling
- Adjust retention settings for the temporary environment
- Preview exactly which partitions will be filled with data before running
What Happens Next?
Once you're satisfied with the results, you can:
- Share the temporary environment with stakeholders for review
- Promote the changes to a dev/stage environment for broader testing. Learn about Dev/Stage Environments
- Deploy to production by running migrations and executing the new graph. Read about Deploying to Production
Conclusion
Datoria's sandboxes are a game-changer for data engineers. It makes testing transformations fast, safe, and repeatable—without affecting shared environments or production data. Whether you're tweaking a single job or testing large, multi-layered transformations, Sandbox gives you the confidence to iterate quickly while keeping everything under control.