Which AI Agents Should You Build for Data Analysis?

Q: How long does it take to build a useful semantic layer?

For the top 20 metrics: 4 to 8 weeks of focused work, plus ongoing maintenance. For comprehensive coverage of an enterprise's metric surface: 12 to 24 months. Don't wait for comprehensive — the top 20 unlock most of the AI value.

TL;DR

Agent	Verdict	Why
Semantic-layer-backed natural-language analytics	Build now	The right abstraction; durable; auditable
Anomaly-detection and alerting agent	Build now	Pattern matching is the agent’s strong suit
Data-quality monitoring agent	Build now	Replaces a dashboard nobody reads with a queue someone acts on
Documentation and lineage agent	Build second	Auto-documents schemas; reduces tribal-knowledge debt
Direct natural-language to SQL on raw warehouse	Don’t build	The known failure mode
Auto-decision agent on top of dashboards	Hold	Wrong abstraction; humans still own the call
Auto-summary of every dashboard	Hold	Most summaries no one reads; eats compute budget
Auto-modeling / auto-ML agent	Hold	Better than nothing for naive teams; worse than a competent one

The architectural call: build the semantic layer first, then deploy the agent on top. The companies that built NL-to-SQL directly onto raw schemas in 2023 are the ones that ripped it out in 2025.

“Natural language to SQL” is the wrong abstraction. The right one is a semantic layer the agent queries — and most companies that built NL-to-SQL directly onto raw schemas in 2023 have ripped it out by 2025 because the production failures are predictable and severe.

The data-analysis-AI conversation is dominated by the NL-to-SQL pitch. The vendor demos look magical. The production deployments rarely survive 12 months. The reason isn’t AI capability — it’s that NL-to-SQL is the wrong layer to deploy at, and the right architectural choice (semantic layer + agent) is what most teams skip because it’s slower to demo.

This piece is the architectural distinction, the four agents that work in production, and what to refuse.

The frame: semantic layer first, agent second

Three layers between a question and an answer in an analytical system.

Raw data (warehouse tables, with whatever schema and naming conventions accumulated over the years).
Semantic layer (defined business concepts: what is “active customer,” what is “monthly recurring revenue,” what does “churn” mean in our model).
Question layer (what the analyst or executive is actually asking).

NL-to-SQL agents try to bridge layer 3 directly to layer 1. This breaks in production for predictable reasons:

Schema sprawl. A typical mid-sized company has 200–2,000 tables with naming conventions accumulated over a decade. The agent doesn’t know which users_v2 is the right one. Asking it to figure it out is asking it to encode tribal knowledge no one wrote down.
Semantic drift. What “revenue” means is different in the finance team’s context than in the sales team’s. The agent gives a confident answer using one definition; the executive interprets it using another. The number is wrong, but neither party can tell.
The audit problem. When the executive asks “how did the agent get this number,” the answer is a 400-line SQL query. That’s not auditable in any practical sense — and finance, ops, and product leaders increasingly need their numbers to be defensible.

The semantic-layer architecture solves all three. The agent doesn’t write SQL against users_v2; it queries the semantic layer’s active_customer definition, which is human-curated, version-controlled, and defended by the data team. The output is auditable (every number traces to a defined metric), the semantic drift is governed (one definition, used everywhere), and the agent’s accuracy improves dramatically because the surface it’s reasoning over is bounded and labeled.

The companies that built NL-to-SQL in 2023 without a semantic layer are the ones rebuilding in 2025. The companies that built the semantic layer first are the ones whose AI investment compounds.

The four agents that fit the architecture

1. Semantic-layer-backed natural-language analytics (build now)

What it does: business user asks a question in natural language → agent translates to a query against the semantic layer (not raw SQL) → returns the answer with the metric definition cited and the confidence level explicit.

Why it works: the surface the agent reasons over is small (hundreds of metric definitions, not thousands of tables). The output is auditable (every metric is defined and version-controlled). Hallucinations are bounded (the agent can refuse to answer when the question doesn’t map to a defined metric, instead of inventing one).

Realistic ROI: 40–70% of routine analytical questions deflect from the data team to self-service. Plus a step-change in answer consistency across the org — which is the second-order benefit that drives the real ROI.

Build cost: heavy, but the heavy part is the semantic layer (which you should be building anyway). The agent on top is medium cost. Hosted alternatives include dbt’s MetricFlow + Coalesce, Cube, or AtScale; native AI features in Looker, Tableau, ThoughtSpot are getting closer but vary in quality.

2. Anomaly-detection and alerting agent (build now)

What it does: monitors key business metrics, detects anomalies (spikes, drops, drifts), distinguishes signal from noise (seasonality, planned campaigns), drafts the explanation and routes the alert to the right team.

Why it works: pattern matching is the agent’s strong suit. The agent’s job isn’t to make the call; it’s to surface the unusual to the human who makes the call.

Realistic ROI: catches 60–80% of meaningful anomalies before they’re noticed downstream. For a typical mid-sized business, that’s 1–4 incident-class events per quarter caught hours/days earlier than they would have been.

Build cost: medium. Most modern observability platforms (Monte Carlo, Anomalo, Sifflet) include this; the build question is wrapper or platform.

3. Data-quality monitoring agent (build now)

What it does: continuously checks data-quality conditions (freshness, completeness, schema drift, value distributions). Routes failures to a queue someone owns; doesn’t block production except for severity-1 failures.

Why it works: the dashboard-with-red-lights pattern most teams have doesn’t work because no one watches it. The queue-with-an-owner pattern works because someone is accountable.

Realistic ROI: meaningful reduction in downstream-of-data-quality bugs (wrong dashboards, wrong reports, wrong forecasts). The compound effect is trust — when the data team can confidently say “the data is good,” every downstream conversation goes faster.

Build cost: medium. Same vendor list as anomaly detection; usually the same platform.

4. Documentation and lineage agent (build second)

What it does: scans the warehouse, the dbt project, and the BI tool. Documents what each table and column represents. Tracks lineage (which models feed which dashboards). Answers questions like “what does this column mean” and “what dashboards use this metric” without a human having to chase it.

Why it works: the documentation is the unloved chore that determines team velocity. An agent that does it continuously eliminates the tribal-knowledge debt that slows every new hire and every cross-team request.

Realistic ROI: hard to quantify directly; the second-order effect is large. New analysts onboard in days, not weeks. Cross-team requests are answered in minutes, not days.

Build cost: light to medium. Most data catalogs (Collibra, Alation, Atlan) are adding this; build only for unusual environments.

The agents to refuse (or hold)

Direct natural-language to SQL on raw warehouse (don’t build). The known failure pattern. Vendors will demo it; production will reject it within a year. Build the semantic layer first. Always.

Auto-decision agent on top of dashboards (hold). The pitch is to “let the agent take action on the metric.” The right action layer is rarely the dashboard layer; it’s a workflow layer (CRM, marketing automation, ops system). Build the workflow agents in those systems instead.

Auto-summary of every dashboard (hold). Some vendors are pitching agents that auto-summarize every dashboard daily. Most of these summaries no one reads; the cost (compute and attention) outweighs the benefit. Build summarization for the 5 dashboards leadership actually reads, not the 500 in the BI tool.

Auto-modeling / auto-ML agent (hold). Better than nothing for teams without a data scientist; worse than a competent one for teams that have one. Most mid-sized companies are better served by hiring a senior analyst than by deploying an auto-ML agent. The exception: well-defined, high-volume modeling tasks (lead scoring, churn prediction, returns prediction) where the auto-ML approach is genuinely competitive with a human.

The architectural decision under all of this

If you’re building any of the four agents, three commitments matter.

1. The semantic layer is owned and version-controlled. This is a real engineering investment. The metrics, dimensions, and definitions are code, reviewed in PRs, with tests. If your “metrics” are spread across 50 dashboards with 50 implicit definitions, the agent’s accuracy ceiling is the worst of those.

2. Every agent answer cites its sources. The metric definition, the time range, the filters applied. This is the auditability commitment that makes the answers defensible.

3. The agent has an “I don’t know” mode. Most NL-to-SQL agents are tuned to always produce an answer. The right behavior is to refuse when the question doesn’t map cleanly to a defined metric, and to suggest the closest defined metric instead.

The counter-argument

A reasonable head of data will push back: “Building a semantic layer is a 12-month engineering project. The business needs answers now.”

Two things to know.

First, the semantic layer is a project you’re already running, just not formalizing. The “what does revenue mean” debate happens in every quarterly close meeting. Formalizing it as a layer accelerates every meeting that follows. The cost is structural; it’s already being paid in dashboard inconsistency and meeting time.

Second, you don’t need to define every metric on day one. Define the 20 metrics leadership actually uses. Deploy the agent over those. Expand from there. Most teams that get stuck in the 12-month build are trying to define 500 metrics; the right approach is to define 20 well and grow.

What to do this quarter

Audit your metric definitions. Pull the 20 metrics leadership uses. Write down what each one means. Half of them won’t have a single agreed-upon definition. That’s the work to do first.
Build or buy the semantic layer. dbt MetricFlow, Cube, or platform-native (Looker LookML, ThoughtSpot). Start with the 20 metrics from step 1.
Deploy the analytics agent only after the semantic layer covers the leadership-question surface. Not before.
Refuse the NL-to-SQL pilot if it’s pitched without a semantic layer. It’s the failure pattern. The pitch deck is convincing; the production reality is not.

The data orgs that win the AI cycle won’t be the ones who deployed NL-to-SQL fastest. They’ll be the ones whose semantic layer was clean enough that the AI on top of it was actually accurate.

FAQ

Why doesn’t natural-language-to-SQL work in production? Three failure modes. Schema sprawl: the agent doesn’t know which of your 500 tables is the right one for a given question. Semantic drift: what “revenue” means in finance is different from sales, and the agent picks one without knowing. Auditability: a 400-line SQL query is not a defensible audit trail. The semantic-layer architecture solves all three.

What’s a semantic layer and how is it different from a dashboard? A semantic layer is a code-defined catalog of business concepts (metrics, dimensions, definitions). A dashboard is a visualization on top of those concepts. The semantic layer is the source of truth; dashboards consume it. Most companies have dashboards without a semantic layer underneath, which is why their numbers don’t agree.

How long does it take to build a useful semantic layer? For the top 20 metrics: 4–8 weeks of focused work, plus ongoing maintenance. For comprehensive coverage of an enterprise’s metric surface: 12–24 months. Don’t wait for comprehensive — the top 20 unlock most of the AI value.

Which BI tools have good native AI features? The picture changes quarterly. As of mid-2026, ThoughtSpot Sage and the newer Looker AI features are leading on quality of natural-language analytics; Tableau Pulse is improving; Mode and Hex are strong on the analyst-augmentation side. None substitute for a working semantic layer; they all benefit from one.

Should we hire data scientists or deploy AI agents? Both, but with role clarity. AI agents handle the routine analytical questions (what happened, when, by how much). Data scientists handle the unusual questions (why, what now, how do we model this). The agent’s deflection of routine work makes the data scientist’s time more valuable, not less.

Working with JAIN on AI for data and analytics? We help heads of data sequence the semantic-layer build before any agent goes near a query. Book a 30-minute call.

Related reading: