Skip to main content

Routines

Beyond tables, views, and jobs, Datoria supports user-defined routines that extend your data platform's capabilities. The most common type is the User-Defined Function (UDF), which lets you encapsulate reusable logic directly within your database.

User-Defined Functions (UDFs)

BigQuery supports two types of UDFs: temporary and persistent. Datoria allows you to define both types using a consistent, declarative approach.

Temporary UDFs

Temporary UDFs exist only for the duration of a query. They're defined inline and aren't stored in the database:

import { BQType, BQUdf, BQRoutine, sql } from "@datoria/sdk";

export const calculateDiscountTemporary = BQUdf.Temporary({
name: "calculateDiscountTemporary",
params: [
BQRoutine.Param("price", BQType.Float),
BQRoutine.Param("discountPercent", BQType.Float),
],
body: BQUdf.Body.Sql(sql`ROUND(price * (1 - discountPercent / 100), 2)`),
returns: BQType.Float,
});

Persistent UDFs

Persistent UDFs are stored in the database and can be referenced across multiple queries:

import { analyticsDataset } from "../../core-concepts/tables-and-partitioning.md";

export const formatCurrencyPersistent = BQUdf.Persistent({
dataset: analyticsDataset,
name: "formatCurrencyPersistent",
params: [
BQRoutine.Param("amount", BQType.Float),
BQRoutine.Param("currencyCode", BQType.String),
],
body: BQUdf.Body.Sql(sql`
CASE
WHEN currencyCode = 'USD' THEN CONCAT('$', FORMAT('%.2f', amount))
WHEN currencyCode = 'EUR' THEN CONCAT('€', FORMAT('%.2f', amount))
WHEN currencyCode = 'GBP' THEN CONCAT('£', FORMAT('%.2f', amount))
ELSE CONCAT(currencyCode, ' ', FORMAT('%.2f', amount))
END
`),
returns: BQType.String,
});

The key difference is that persistent UDFs require a dataset property to specify where they should be stored.

Using UDFs in Queries

Once defined, UDFs can be referenced by their TypeScript name in your SQL queries:

// We're referencing the UDFs defined above
export const calculationExample = sql`
SELECT
${calculateDiscountTemporary.call(100.0, 15.0)} AS discounted_price_temp,
${formatCurrencyPersistent.call(sql`120.50`, sql`'EUR'`)} AS formatted_price_persist
`;

Note that you can pass both literal values and SQL fragments as arguments to UDFs.

note

Temporary UDFs will only work in transformation jobs. Use persistent UDFs in views.

Complex UDFs

UDFs can encapsulate more complex logic to standardize calculations across your organization:

Aggregation Helper

export const percentile90 = BQUdf.Persistent({
dataset: analyticsDataset,
name: "percentile_90",
params: [BQRoutine.Param("values", BQType.Array(BQType.Float))],
body: BQUdf.Body.Sql(sql`
(SELECT
PERCENTILE_CONT(value, 0.9) OVER()
FROM UNNEST(values) AS value
LIMIT 1)
`),
returns: BQType.Float,
});

Business Logic Standardization

export const calculateLTV = BQUdf.Persistent({
dataset: analyticsDataset,
name: "calculate_customer_ltv",
params: [
BQRoutine.Param("total_purchases", BQType.Float),
BQRoutine.Param("avg_order_value", BQType.Float),
BQRoutine.Param("first_purchase_date", BQType.Date),
BQRoutine.Param("latest_purchase_date", BQType.Date),
],
body: BQUdf.Body.Sql(sql`
CASE
WHEN total_purchases = 0 THEN 0
WHEN DATE_DIFF(latest_purchase_date, first_purchase_date, DAY) < 30 THEN avg_order_value * 1.5
ELSE avg_order_value * total_purchases * (1 + 0.1 * (total_purchases - 1))
END
`),
returns: BQType.Float,
});

UDFs with Table Dependencies

UDFs can also reference tables, creating dependencies that Datoria tracks and validates:

import { BQField, BQFieldType, BQPartitioning, BQTable } from "@datoria/sdk";

// Define a reference table for region codes
const regionCodesTable = BQTable({
dataset: analyticsDataset,
name: "region_codes",
partitioning: BQPartitioning.Unpartitioned,
schema: [
BQField("region_code", BQFieldType.String, "REQUIRED"),
BQField("name", BQFieldType.String, "REQUIRED"),
],
});

export const getRegionName = BQUdf.Persistent({
dataset: analyticsDataset,
name: "get_region_name",
params: [BQRoutine.Param("code", BQType.String)],
returns: BQType.String,
body: BQUdf.Body.Sql(sql`
(SELECT name
FROM ${regionCodesTable.allPartitions()}
WHERE region_code = code
LIMIT 1)
`),
});

Advanced Routine Types

TVFs (Table-Valued Functions)

BigQuery also supports Table-Valued Functions (TVFs), which return entire tables rather than scalar values:

import { BQTvf } from "@datoria/sdk";

// Define a table of user events
const userEventsTable = BQTable({
dataset: analyticsDataset,
name: "user_events",
partitioning: BQPartitioning.Day("date"),
schema: [
BQField("user_id", BQFieldType.String, "REQUIRED"),
BQField("event_type", BQFieldType.String, "REQUIRED"),
BQField("date", BQFieldType.Date, "REQUIRED"),
],
});

export const getUserEventsForPeriod = BQTvf({
dataset: analyticsDataset,
name: "get_user_events_for_period",
params: [
BQRoutine.Param("user_id_param", BQType.String),
BQRoutine.Param("start_date", BQType.Date),
BQRoutine.Param("end_date", BQType.Date),
],
result: [
BQField("user_id", BQFieldType.String, "NULLABLE"),
BQField("event_type", BQFieldType.String, "NULLABLE"),
BQField("date", BQFieldType.Date, "NULLABLE"),
],
query: sql`
SELECT user_id, event_type, date
FROM ${userEventsTable.allPartitions()}
WHERE user_id = user_id_param
AND date BETWEEN start_date AND end_date
`,
});

Testing UDFs

Datoria's testing framework makes it easy to validate UDFs:

import { UnitTest } from "@datoria/sdk";

export const formatCurrencyTest = UnitTest({
testSubject: formatCurrencyPersistent,
description: "should format currency values correctly",
routineArgs: [42.5, "USD"],
assertEquals: "$42.50",
});

For UDFs that reference tables, you can provide test data:

import { TestRelation } from "@datoria/sdk";

export const regionNameTest = UnitTest({
testSubject: getRegionName,
testRelations: [
TestRelation(
regionCodesTable,
{ region_code: "US", name: "United States" },
{ region_code: "CA", name: "Canada" },
),
],
routineArgs: ["US"],
assertEquals: "United States",
});

See the Unit Testing section for more details.

UDF Best Practices

When to Use UDFs

  • Standardize calculations - Ensure consistent business logic across queries
  • Simplify complex expressions - Make SQL more readable by encapsulating complexity
  • Implement reusable patterns - Reduce duplication in your SQL codebase

Performance Considerations

UDFs in BigQuery have some performance implications:

  1. Prefer persistent UDFs for production use
  2. Minimize table references within UDFs when possible
  3. Balance complexity and reuse - Very simple functions might not justify the UDF overhead

Common UDF Patterns

String Formatting

export const formatName = BQUdf.Persistent({
dataset: analyticsDataset,
name: "format_name",
params: [
BQRoutine.Param("first_name", BQType.String),
BQRoutine.Param("last_name", BQType.String),
],
body: BQUdf.Body.Sql(sql`
CONCAT(
INITCAP(first_name),
' ',
INITCAP(last_name)
)
`),
returns: BQType.String,
});

Date Manipulation

export const firstDayOfMonth = BQUdf.Persistent({
dataset: analyticsDataset,
name: "first_day_of_month",
params: [BQRoutine.Param("date", BQType.Date)],
body: BQUdf.Body.Sql(sql`
DATE(EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date), 1)
`),
returns: BQType.Date,
});

Business Metrics Calculation

export const calculateROI = BQUdf.Persistent({
dataset: analyticsDataset,
name: "calculate_roi",
params: [
BQRoutine.Param("revenue", BQType.Float),
BQRoutine.Param("cost", BQType.Float),
],
body: BQUdf.Body.Sql(sql`
CASE
WHEN cost = 0 THEN NULL
ELSE (revenue - cost) / cost * 100
END
`),
returns: BQType.Float,
});

Using UDFs in Jobs and Views

In Jobs

import { TransformationJob } from "@datoria/sdk";

// Define a marketing campaigns table
const marketingCampaignsTable = BQTable({
dataset: analyticsDataset,
name: "marketing_campaigns",
partitioning: BQPartitioning.Unpartitioned,
schema: [
BQField("campaign_id", BQFieldType.String, "REQUIRED"),
BQField("campaign_name", BQFieldType.String, "REQUIRED"),
BQField("revenue", BQFieldType.Float, "REQUIRED"),
BQField("cost", BQFieldType.Float, "REQUIRED"),
],
});

// Define a destination table for ROI calculations
const marketingROITable = BQTable({
dataset: analyticsDataset,
name: "marketing_roi",
partitioning: BQPartitioning.Unpartitioned,
schema: [
BQField("campaign_id", BQFieldType.String, "REQUIRED"),
BQField("campaign_name", BQFieldType.String, "REQUIRED"),
BQField("roi", BQFieldType.Float, "NULLABLE"),
],
});

export const marketingROIJob = TransformationJob({
name: "marketing_roi",
destination: marketingROITable,
query: sql`
SELECT
campaign_id,
campaign_name,
${calculateROI.call(sql`revenue`, sql`cost`)} as roi
FROM ${marketingCampaignsTable.allPartitions()}
`,
});

In Views

import { BQView } from "@datoria/sdk";

export const marketingROIView = BQView({
dataset: analyticsDataset,
name: "marketing_roi_view",
query: sql`
SELECT
campaign_id,
campaign_name,
${calculateROI.call(sql`revenue`, sql`cost`)} as roi
FROM ${marketingCampaignsTable.allPartitions()}
`,
schema: [
BQField("campaign_id", BQFieldType.String, "NULLABLE"),
BQField("campaign_name", BQFieldType.String, "NULLABLE"),
BQField("roi", BQFieldType.Float, "NULLABLE"),
],
});

Migrating UDFs

When you run datoria migrate, Datoria will:

  1. Identify new or changed UDFs
  2. Create or update them in your database
  3. Ensure they have the correct parameters and return types

No special handling is needed for routine migrations—they're treated like any other component in your data platform.

Conclusion

Routines, particularly UDFs, are a powerful way to encapsulate logic and standardize calculations in your data platform. By defining them declaratively in Datoria, you ensure they're consistent, testable, and maintainable across your data workflows.