Skip to main content

Lovable testing of SQL

Datoria turns SQL testing into a smooth, intuitive experience, even if your pipelines are both deep and long.

You define your tests declaratively, and you won't even have to run them yourself!

Enjoy an effortless testing experience that empowers you to build robust data pipelines with joy and confidence.

The Datoria Advantage

  • Test Without Creation: Define and test deep data pipelines without altering the database.

  • Instant Feedback: Any errors are flagged immediately, so you can iterate quickly.

  • SQL Rewriting & Caching: Your SQL is transformed into an independent query that executes once on the real database and caches its results forever—keeping tests up to date without manual re-execution.

  • No transpilation: Your queries run on the real database. This means you can leverage all the database features, including the latest syntax (like piped SQL).

  • Schema-Aware Validation: With all schemas in memory, Datoria automatically parses JSON inputs into the correct types and fills in default values where data is omitted, even within nested structures.

  • Fantastic debuggability: Datoria unit tests are ultimately SQL queries. When something fails, you can copy/paste that SQL query into your database and debug it there.


Real-World Examples

Example 1: Testing a Routine (UDF)

Watch the video to see the workflow in action:

Definition Under Test
import { BQType, BQRoutine, sql } from "@datoria/sdk";
import { analyticsDataset } from "../../core-concepts/tables-and-partitioning.md";

export const plus: BQUdf.Persistent = BQUdf.Persistent({
dataset: analyticsDataset,
name: "plus",
params: [
BQRoutine.Param("one", BQType.Integer),
BQRoutine.Param("two", BQType.Integer),
],
body: BQUdf.Body.Sql(sql`one + two`),
returns: BQType.Integer,
});
Unit Test Code
import { UnitTest } from "@datoria/sdk";

export const plusTest = UnitTest({
testSubject: plus,
description: "should add two numbers",
routineArgs: [1, 2],
assertEquals: 3,
});

Example 2: Testing a Routine with Table Data

Watch the video demonstration:

Numbers Table Definition
import {
BQSourceTable,
BQField,
BQFieldType,
BQPartitioning,
} from "@datoria/sdk";

export const numbers = BQSourceTable({
dataset: analyticsDataset,
name: "numbers",
location: "EU",
partitioning: BQPartitioning.Unpartitioned,
schema: [BQField("num", BQFieldType.Integer, "REQUIRED")],
});
Definition Under Test (UDF)
import { BQUdf } from "@datoria/sdk";

export const superplus: BQUdf.Persistent = BQUdf.Persistent({
dataset: analyticsDataset,
name: "superplus",
params: [BQRoutine.Param("two", BQType.Integer)],
returns: BQType.Integer,
body: BQUdf.Body.Sql(sql`(select max(x.num) from ${numbers} x) + two`),
});
Unit Test Code
export const superplusTest = UnitTest({
testSubject: superplus,
testRelations: [TestRelation(numbers, { num: 42 }, { num: 43 })],
routineArgs: [1],
assertEquals: 44,
});

Example 3: Testing Jobs

This example demonstrates a job that processes performance data from a karaoke table and calculates the maximum stage presence per artist by day.

Karaoke Table Definition (Scenario Background)
export const karaokePerformance = BQSourceTable({
dataset: analyticsDataset,
name: "karaoke-performance",
location: "EU",
partitioning: BQPartitioning.Day("started_at"),
schema: [
BQField("started_at", BQFieldType.Timestamp, "REQUIRED"),
BQField("artist_id", BQFieldType.Integer, "REQUIRED"),
BQField.struct("data", "REQUIRED")(
BQField("song_id", BQFieldType.Integer, "REQUIRED"),
BQField("seconds_of_applause", BQFieldType.Float, "REQUIRED"),
BQField("judge_rating", BQFieldType.Float, "REQUIRED", {
description: "scale 1-10",
}),
),
],
});
Job Definition
import { BQTable, BQPartition, TransformationJob } from "@datoria/sdk";

export const stagePresenceJob = TransformationJob({
name: "max-stage-presence-by-date",
query: sql`
select
DATE(k.started_at) as date,
k.artist_id,
max((k.data.seconds_of_applause * k.data.judge_rating) / 10.0) as max_stage_presence
from ${karaokePerformance.atExecutionDate()} k
group by DATE(k.started_at), k.artist_id
`,
destination: BQTable({
dataset: analyticsDataset,
name: "max-stage-presence-by-date",
partitioning: BQPartitioning.Day("date"),
schema: [
BQField("date", BQFieldType.Date, "REQUIRED"),
BQField("artist_id", BQFieldType.Integer, "REQUIRED"),
BQField("max_stage_presence", BQFieldType.Float, "REQUIRED"),
],
}),
startPartition: BQPartition.Day(2025, 1, 1),
});
Unit Test Code
import { TestRelation } from "@datoria/sdk";

export const stagePresenceTest: UnitTest = UnitTest({
testSubject: stagePresenceJob,
testRelations: [
TestRelation(
karaokePerformance,
{
started_at: "2025-01-01 12:00:00",
artist_id: 1,
data: { seconds_of_applause: 12, judge_rating: 5.5 },
},
{
started_at: "2025-01-01 21:00:00",
artist_id: 2,
data: { seconds_of_applause: 15, judge_rating: 8.5 },
},
{
started_at: "2025-01-01 23:00:00",
artist_id: 2,
data: { seconds_of_applause: 3, judge_rating: 1.5 },
},
),
],
assertEquals: [
{ artist_id: 1, date: "2025-01-01", max_stage_presence: 6.6 },
{ artist_id: 2, date: "2025-01-01", max_stage_presence: 12.75 },
],
});

Reference

Shorthand Syntax: JSON Format

When specifying test inputs with shorthand syntax:

  • Input as JSON: Write values in JSON, and our system parses them according to the expected schema.
  • Type Conversion: For example, a JSON string representing a date or timestamp is automatically converted to the correct literal when sent to BigQuery.
  • Specify only what you care about: Any column not specified, including inside structs, will be filled with a default/emptyish value.
  • Arbitrary SQL Values: You can use the sql interpolator to insert arbitrary SQL expressions where necessary, offering flexibility beyond standard JSON values.

Example JSON Input:

{
"started_at": "2025-01-01 12:00:00",
"artist_id": sql`1 + 1`,
"data": {
"seconds_of_applause": 12,
"judge_rating": 5.5
}
}

Will end up looking like this when parsed with the schema from the karaoke table in example 3:

(select TIMESTAMP('2025-01-01 12:00:00') as started_at,1 + 1 as artist_id,struct((select 0) as song_id, 12 as seconds_of_applause, 6.5 as judge_rating) as data)

Using TestRelation

TestRelation is a variadic function designed for specifying the data state of a table (or job output) during tests. Its usage is as follows:

  • First Parameter: A reference to the table (or job) whose data you want to specify.
  • Subsequent Parameters: Each object represents a row of data. Fields you omit in any row are automatically populated with default values, even within nested structures or structs.

Example Usage:

TestRelation(
numbers, // Reference to the table or job
{ num: 42 }, // First row: Only 'num' specified
{ num: 43 }, // Second row: Only 'num' specified
);

Matchers

Datoria provides two matchers to validate test outputs:

  1. assertEquals:
  • Exact Match Requirement: Provide a JSON object (or a JavaScript array of objects for multiple rows) that exactly matches the returned row(s).
  • Usage Scenario: Ideal for cases where the entire output must be precisely defined.
  1. assertMatches:
  • Subset Matching: Provide a subset of the expected output. This matcher supports partial matching, including subsets within nested structures or structs.
  • Usage Scenario: Useful when only specific key values or parts of the output need to be verified.