Routines
Beyond tables, views, and jobs, Datoria supports user-defined routines that extend your data platform's capabilities. The most common type is the User-Defined Function (UDF), which lets you encapsulate reusable logic directly within your database.
User-Defined Functions (UDFs)
BigQuery supports two types of UDFs: temporary and persistent. Datoria allows you to define both types using a consistent, declarative approach.
Temporary UDFs
Temporary UDFs exist only for the duration of a query. They're defined inline and aren't stored in the database:
import { BQType, BQUdf, BQRoutine, sql } from "@datoria/sdk";
export const calculateDiscountTemporary = BQUdf.Temporary({
name: "calculateDiscountTemporary",
params: [
BQRoutine.Param("price", BQType.Float),
BQRoutine.Param("discountPercent", BQType.Float),
],
body: BQUdf.Body.Sql(sql`ROUND(price * (1 - discountPercent / 100), 2)`),
returns: BQType.Float,
});
Persistent UDFs
Persistent UDFs are stored in the database and can be referenced across multiple queries:
import { analyticsDataset } from "../../core-concepts/tables-and-partitioning.md";
export const formatCurrencyPersistent = BQUdf.Persistent({
dataset: analyticsDataset,
name: "formatCurrencyPersistent",
params: [
BQRoutine.Param("amount", BQType.Float),
BQRoutine.Param("currencyCode", BQType.String),
],
body: BQUdf.Body.Sql(sql`
CASE
WHEN currencyCode = 'USD' THEN CONCAT('$', FORMAT('%.2f', amount))
WHEN currencyCode = 'EUR' THEN CONCAT('€', FORMAT('%.2f', amount))
WHEN currencyCode = 'GBP' THEN CONCAT('£', FORMAT('%.2f', amount))
ELSE CONCAT(currencyCode, ' ', FORMAT('%.2f', amount))
END
`),
returns: BQType.String,
});
The key difference is that persistent UDFs require a dataset
property to specify where they should be stored.
Using UDFs in Queries
Once defined, UDFs can be referenced by their TypeScript name in your SQL queries:
// We're referencing the UDFs defined above
export const calculationExample = sql`
SELECT
${calculateDiscountTemporary.call(100.0, 15.0)} AS discounted_price_temp,
${formatCurrencyPersistent.call(sql`120.50`, sql`'EUR'`)} AS formatted_price_persist
`;
Note that you can pass both literal values and SQL fragments as arguments to UDFs.
Temporary UDFs will only work in transformation jobs. Use persistent UDFs in views.
Complex UDFs
UDFs can encapsulate more complex logic to standardize calculations across your organization:
Aggregation Helper
export const percentile90 = BQUdf.Persistent({
dataset: analyticsDataset,
name: "percentile_90",
params: [BQRoutine.Param("values", BQType.Array(BQType.Float))],
body: BQUdf.Body.Sql(sql`
(SELECT
PERCENTILE_CONT(value, 0.9) OVER()
FROM UNNEST(values) AS value
LIMIT 1)
`),
returns: BQType.Float,
});
Business Logic Standardization
export const calculateLTV = BQUdf.Persistent({
dataset: analyticsDataset,
name: "calculate_customer_ltv",
params: [
BQRoutine.Param("total_purchases", BQType.Float),
BQRoutine.Param("avg_order_value", BQType.Float),
BQRoutine.Param("first_purchase_date", BQType.Date),
BQRoutine.Param("latest_purchase_date", BQType.Date),
],
body: BQUdf.Body.Sql(sql`
CASE
WHEN total_purchases = 0 THEN 0
WHEN DATE_DIFF(latest_purchase_date, first_purchase_date, DAY) < 30 THEN avg_order_value * 1.5
ELSE avg_order_value * total_purchases * (1 + 0.1 * (total_purchases - 1))
END
`),
returns: BQType.Float,
});
UDFs with Table Dependencies
UDFs can also reference tables, creating dependencies that Datoria tracks and validates:
import { BQField, BQFieldType, BQPartitioning, BQTable } from "@datoria/sdk";
// Define a reference table for region codes
const regionCodesTable = BQTable({
dataset: analyticsDataset,
name: "region_codes",
partitioning: BQPartitioning.Unpartitioned,
schema: [
BQField("region_code", BQFieldType.String, "REQUIRED"),
BQField("name", BQFieldType.String, "REQUIRED"),
],
});
export const getRegionName = BQUdf.Persistent({
dataset: analyticsDataset,
name: "get_region_name",
params: [BQRoutine.Param("code", BQType.String)],
returns: BQType.String,
body: BQUdf.Body.Sql(sql`
(SELECT name
FROM ${regionCodesTable.allPartitions()}
WHERE region_code = code
LIMIT 1)
`),
});
Advanced Routine Types
TVFs (Table-Valued Functions)
BigQuery also supports Table-Valued Functions (TVFs), which return entire tables rather than scalar values:
import { BQTvf } from "@datoria/sdk";
// Define a table of user events
const userEventsTable = BQTable({
dataset: analyticsDataset,
name: "user_events",
partitioning: BQPartitioning.Day("date"),
schema: [
BQField("user_id", BQFieldType.String, "REQUIRED"),
BQField("event_type", BQFieldType.String, "REQUIRED"),
BQField("date", BQFieldType.Date, "REQUIRED"),
],
});
export const getUserEventsForPeriod = BQTvf({
dataset: analyticsDataset,
name: "get_user_events_for_period",
params: [
BQRoutine.Param("user_id_param", BQType.String),
BQRoutine.Param("start_date", BQType.Date),
BQRoutine.Param("end_date", BQType.Date),
],
result: [
BQField("user_id", BQFieldType.String, "NULLABLE"),
BQField("event_type", BQFieldType.String, "NULLABLE"),
BQField("date", BQFieldType.Date, "NULLABLE"),
],
query: sql`
SELECT user_id, event_type, date
FROM ${userEventsTable.allPartitions()}
WHERE user_id = user_id_param
AND date BETWEEN start_date AND end_date
`,
});
Testing UDFs
Datoria's testing framework makes it easy to validate UDFs:
import { UnitTest } from "@datoria/sdk";
export const formatCurrencyTest = UnitTest({
testSubject: formatCurrencyPersistent,
description: "should format currency values correctly",
routineArgs: [42.5, "USD"],
assertEquals: "$42.50",
});
For UDFs that reference tables, you can provide test data:
import { TestRelation } from "@datoria/sdk";
export const regionNameTest = UnitTest({
testSubject: getRegionName,
testRelations: [
TestRelation(
regionCodesTable,
{ region_code: "US", name: "United States" },
{ region_code: "CA", name: "Canada" },
),
],
routineArgs: ["US"],
assertEquals: "United States",
});
See the Unit Testing section for more details.
UDF Best Practices
When to Use UDFs
- Standardize calculations - Ensure consistent business logic across queries
- Simplify complex expressions - Make SQL more readable by encapsulating complexity
- Implement reusable patterns - Reduce duplication in your SQL codebase
Performance Considerations
UDFs in BigQuery have some performance implications:
- Prefer persistent UDFs for production use
- Minimize table references within UDFs when possible
- Balance complexity and reuse - Very simple functions might not justify the UDF overhead
Common UDF Patterns
String Formatting
export const formatName = BQUdf.Persistent({
dataset: analyticsDataset,
name: "format_name",
params: [
BQRoutine.Param("first_name", BQType.String),
BQRoutine.Param("last_name", BQType.String),
],
body: BQUdf.Body.Sql(sql`
CONCAT(
INITCAP(first_name),
' ',
INITCAP(last_name)
)
`),
returns: BQType.String,
});
Date Manipulation
export const firstDayOfMonth = BQUdf.Persistent({
dataset: analyticsDataset,
name: "first_day_of_month",
params: [BQRoutine.Param("date", BQType.Date)],
body: BQUdf.Body.Sql(sql`
DATE(EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date), 1)
`),
returns: BQType.Date,
});
Business Metrics Calculation
export const calculateROI = BQUdf.Persistent({
dataset: analyticsDataset,
name: "calculate_roi",
params: [
BQRoutine.Param("revenue", BQType.Float),
BQRoutine.Param("cost", BQType.Float),
],
body: BQUdf.Body.Sql(sql`
CASE
WHEN cost = 0 THEN NULL
ELSE (revenue - cost) / cost * 100
END
`),
returns: BQType.Float,
});
Using UDFs in Jobs and Views
In Jobs
import { TransformationJob } from "@datoria/sdk";
// Define a marketing campaigns table
const marketingCampaignsTable = BQTable({
dataset: analyticsDataset,
name: "marketing_campaigns",
partitioning: BQPartitioning.Unpartitioned,
schema: [
BQField("campaign_id", BQFieldType.String, "REQUIRED"),
BQField("campaign_name", BQFieldType.String, "REQUIRED"),
BQField("revenue", BQFieldType.Float, "REQUIRED"),
BQField("cost", BQFieldType.Float, "REQUIRED"),
],
});
// Define a destination table for ROI calculations
const marketingROITable = BQTable({
dataset: analyticsDataset,
name: "marketing_roi",
partitioning: BQPartitioning.Unpartitioned,
schema: [
BQField("campaign_id", BQFieldType.String, "REQUIRED"),
BQField("campaign_name", BQFieldType.String, "REQUIRED"),
BQField("roi", BQFieldType.Float, "NULLABLE"),
],
});
export const marketingROIJob = TransformationJob({
name: "marketing_roi",
destination: marketingROITable,
query: sql`
SELECT
campaign_id,
campaign_name,
${calculateROI.call(sql`revenue`, sql`cost`)} as roi
FROM ${marketingCampaignsTable.allPartitions()}
`,
});
In Views
import { BQView } from "@datoria/sdk";
export const marketingROIView = BQView({
dataset: analyticsDataset,
name: "marketing_roi_view",
query: sql`
SELECT
campaign_id,
campaign_name,
${calculateROI.call(sql`revenue`, sql`cost`)} as roi
FROM ${marketingCampaignsTable.allPartitions()}
`,
schema: [
BQField("campaign_id", BQFieldType.String, "NULLABLE"),
BQField("campaign_name", BQFieldType.String, "NULLABLE"),
BQField("roi", BQFieldType.Float, "NULLABLE"),
],
});
Migrating UDFs
When you run datoria migrate
, Datoria will:
- Identify new or changed UDFs
- Create or update them in your database
- Ensure they have the correct parameters and return types
No special handling is needed for routine migrations—they're treated like any other component in your data platform.
Conclusion
Routines, particularly UDFs, are a powerful way to encapsulate logic and standardize calculations in your data platform. By defining them declaratively in Datoria, you ensure they're consistent, testable, and maintainable across your data workflows.