In the old world of data engineering, you wrote ETL jobs. Someone sat down, opened a notebook or IDE, and hand-coded every transformation step. If the schema changed, you fixed the code. If the business logic shifted, you refactored. If you needed a new pipeline, you started from scratch.
That works fine when humans are driving. But we’re not anymore — or at least, we shouldn’t be.
Today’s AI agents need to build, modify, and optimize data pipelines on their own. Not by writing Python or Scala, but by declaring what they want to happen in a format they can reason about, version, test, and iterate on. That’s where declarative AI transformations come in.
The Problem With Code-First Pipelines
Traditional ETL/ELT tools force a choice: either keep it simple (SQL) or get powerful (code). Simple tools can’t express complex business logic. Powerful tools require humans to read, review, and maintain the code.
For AI agents, this becomes a bottleneck. Here’s why:
-
Agents think in steps, not in code. They reason about what data needs to transform, why, and how to validate it. They don’t naturally think in for-loops and SQL joins.
-
Auditability is harder. When an agent modifies a Python pipeline, you have to review the code to understand what changed. With declarations, the diff is transparent.
-
Composability breaks down. Agents can’t easily combine partial transformations or create variants of a pipeline without rewriting logic.
-
Version control becomes messy. Code-based pipelines treat transformations as monolithic units. Declarative pipelines can be diffed, merged, and composed like configuration.
Enter declarative transformations.
What Are Declarative AI Transformations?
A declarative AI transformation describes what a data transformation should do — not how to do it. It specifies:
- Input schema: What columns and types the transformation expects
- Output schema: What the transformation should produce
- Transformation rules: The logic applied to each row or batch
- Validation: How to detect and handle errors
- Optimization hints: Partitioning, caching, parallel execution
The platform — in this case, Datris Platform — interprets the declaration and figures out the most efficient execution strategy.
Here’s a simple example:
name: "calculate_portfolio_returns"
version: "1.0"
input:
schema:
- {name: "portfolio_id", type: "string"}
- {name: "start_value", type: "decimal"}
- {name: "end_value", type: "decimal"}
- {name: "cash_flows", type: "array<decimal>"}
output:
schema:
- {name: "portfolio_id", type: "string"}
- {name: "return_pct", type: "decimal"}
- {name: "twr", type: "decimal"} # Time-weighted return
- {name: "calculation_date", type: "timestamp"}
rules:
- type: "aiTransform"
description: "Calculate simple return and time-weighted return"
model: "claude-3.5-sonnet"
instruction: |
Given start_value, end_value, and cash_flows, calculate:
1. Simple return: (end_value - start_value) / start_value
2. Time-weighted return using modified Dietz method
Return as JSON with return_pct and twr fields.
- type: "validation"
rule: "return_pct must be between -1 and 10"
onError: "log" # or "skip" or "fail"
- type: "metadata"
timestamp_field: "calculation_date"
partition_by: "portfolio_id"
An agent can understand this. It can see exactly what’s being calculated. It can modify the instruction to the AI model. It can add new rules or swap the validation logic. It can version it, test it, and handoff to another agent downstream.
How Agents Use This
Here’s a real workflow:
Step 1: Agent discovers data quality issue An agent scanning portfolio data notices that return calculations are sometimes missing when cash flows are zero. Instead of filing a ticket or waiting for a human, it:
- Reads the existing transformation declaration
- Identifies the gap: “cash_flows array is empty in 2.3% of records”
- Proposes a modification: Add a rule to handle zero cash flows explicitly
- Writes a new version of the transformation YAML with the fix
- Tests it against historical data
- If validation passes, promotes it to the next environment
Step 2: Agent optimizes performance A monitoring agent notices the transformation is running slow on large portfolios. It:
- Analyzes the declaration and sees it’s not partitioned
- Adds partition hints based on typical query patterns
- Adds caching rules for intermediate calculations
- Reruns benchmark tests
- If latency improves, increments the version and redeploys
Step 3: Agent adapts to schema changes When a data source upstream changes its schema, a schema-monitoring agent:
- Detects the mismatch against the transformation’s input schema
- Proposes adapter logic: “Map old field ‘nav’ to new field ‘net_asset_value’”
- Tests the adapter with sample data
- Rolls it into the transformation as a preprocessing step
None of this requires human intervention. The agent reasons about the data, makes changes it can justify, tests, and deploys.
Integration With Datris Platform
On Datris Platform, these transformations live in the config-driven pipeline layer. They work seamlessly with:
- MCP Server: Agents access transformations via the Datris MCP server’s
datris.transformation.create,datris.transformation.test, anddatris.transformation.deploytools. - Data Quality Rules: AI transformations can be paired with aiRule-based validation to catch issues in real-time.
- Multi-Destination Routing: A transformation can be declared once and routed to multiple destinations (data warehouse, vector DB, real-time sink) based on the output schema.
- Docker-Native Deployment: Each transformation spins up as a containerized executor, making it trivial to scale or isolate.
Why This Matters Now
The shift to AI-first data systems isn’t just about using LLMs to query data. It’s about letting AI autonomously operate the data layer. Declarative transformations are the interface that makes that possible.
In financial services especially, this matters. You can’t have agents rewriting your ETL code without audit trails. You can have agents modifying transformation declarations — the changes are clear, testable, and versionable.
Getting Started
If you’re building agent-native data infrastructure, start here:
- Audit your existing transformations. What patterns repeat? What could be expressed as rules instead of code?
- Export as declarations. Map your current logic to transformation YAML or JSON.
- Integrate with an MCP server. Give your agents tools to read, propose, and test changes.
- Add validation gates. Not every agent change goes live — some need human review first.
Datris Platform already supports this workflow out of the box. Check out the GitHub repo for examples, templates, and best practices for declarative transformations.
The future of data engineering isn’t about writing more ETL code. It’s about describing what you want, letting agents optimize it, and staying in control through declarations.
Todd Fearn is the founder of Datris.ai and has been building AI solutions and data infrastructure for financial services for 25+ years, including at Goldman Sachs, Bridgewater Associates, Deutsche Bank, Freddie Mac and others.
Explore Datris Platform: github.com/datris/datris-platform-oss