AI Build vs Buy and the Tooling Decisions That Matter

Q: What about AI infrastructure cost?

Foundation model cost is dropping about 50 percent per year. Infrastructure cost is more stable. Plan for the cost mix to shift.

Q: Should we use foundation models or fine-tune?

For 95 percent of use cases: foundation models with good prompting and RAG. Fine-tuning is for specific situations where prompting can't get the quality needed.

TL;DR

Most companies are over-building infrastructure and under-buying capability. The 2026 working pattern:

Buy: foundation models (gateway), commodity AI products, eval/audit platforms, governance tooling.
Build: differentiating use cases, specialized agents, the data flywheel, your specific tool catalog.
Decisions that matter: model gateway, eval platform, foundation model strategy, agent framework, MCP/tool catalog, observability, vendor diligence.

The tooling decisions that matter and the build/buy choices that come up repeatedly. Most companies get the balance wrong by over-building infrastructure.

The build vs buy conversation gets stuck on big-picture strategy and misses the operational tooling decisions that actually shape AI capability. Each of these decisions has working defaults; deviating without reason creates cost without benefit. This piece is the working frame for the tooling and sourcing decisions executive teams face.

The infrastructure vs capability split

Two categories of AI investment.

Infrastructure (mostly buy)

The platform layer that all agents depend on:

Foundation model access (gateway / routing).
Eval and audit platform.
Observability and tracing.
Governance tooling (policy enforcement, approval workflow).
Tool catalog and MCP infrastructure.

This is shared infrastructure. Mostly buy from established vendors. Build only when you have specific needs vendors don’t address.

Capability (mostly build for differentiation)

The use cases and agents that solve specific business problems:

Customer-facing AI features.
Internal workflow agents specific to your business.
The data flywheel that powers your moat.
Specialized domain agents.

Build for the differentiating capabilities. Buy for the commodity ones.

The 70/30 default from The Build vs Buy Decision for AI applies: most enterprises should be ~70% buy / 30% build.

The seven tooling decisions that matter

Decision 1: Model gateway / routing

What it is: the layer that routes AI requests to the right foundation model. Handles authentication, rate limiting, cost management, fallback.

Buy default: vendor solutions like LiteLLM, OpenRouter, Vercel AI Gateway, or cloud-native (Azure AI Studio, Bedrock). Building reproduces commodity capability.

Build only when: you have specific compliance requirements vendors don’t meet, or you’re operating at scale where the build becomes economical.

Decision 2: Foundation model strategy

What it is: which foundation models you use, in what mix, with what fallback.

Default: multi-model from day one. Don’t lock to one provider; the cost and capability dynamics change too fast.

Watch for: vendor pressure to commit to single-provider exclusively; usually gives the vendor leverage and you flexibility loss.

Covered in detail in MCP and Multi-Model Strategy.

Decision 3: Eval platform

What it is: the platform that runs evals against your agents (hand-labeled sets, regression tests, quality monitors).

Buy default: Braintrust, Langfuse, LangSmith, others. The space is maturing; commodity capability available.

Build only when: you have specialized eval needs vendors don’t address, or the cost at your scale is prohibitive.

Decision 4: Observability and tracing

What it is: production observability for AI agents — request tracing, latency, error rates, cost monitoring.

Buy default: same vendors as eval (often combined offering) plus general-purpose observability with AI extensions (Datadog, New Relic, others).

Build only when: you need integration with internal observability that vendors don’t support.

Decision 5: Agent framework

What it is: the framework you build agents in. LangChain, LlamaIndex, Pydantic AI, custom, etc.

Default: don’t over-commit early. Many teams start with a framework, hit limitations, replace. The right pattern is often light frameworks plus custom code.

Watch for: framework dependency that’s hard to remove; over-investment in framework-specific patterns.

Decision 6: MCP / tool catalog infrastructure

What it is: how your agents access tools (databases, APIs, workflows).

Default: MCP-compatible architecture even if not full MCP yet. Future-proofs against tool ecosystem standardization.

Covered in What Is MCP for Business Leaders.

Decision 7: Vendor diligence and procurement

What it is: how you evaluate AI vendors before procurement.

Default: structured 8-week playbook with seven evaluation dimensions. See AI Vendor Selection: A Procurement Playbook.

Watch for: compressed timelines under sales pressure; results in wrong vendor selections.

What to build (the differentiating layer)

Three categories where building usually pays off.

Category 1: Differentiating use cases

The 1–3 specific agents that change your unit economics or product position. These should be built — they’re your moat.

Category 2: The data flywheel infrastructure

The systems that turn customer interactions into AI improvements. Your customer data + your iteration produces flywheel; the infrastructure for this should be in-house.

Category 3: Specialized integrations

Integrations with your specific systems that vendors don’t have. Your specific CRM, your specific workflows, your specific data warehouse. Build these.

The build/buy decision matrix

For each AI capability, ask:

Question	Buy if	Build if
Is it commodity?	Yes	No
Does it touch differentiation?	No	Yes
Vendor exists?	Yes (good vendor)	No good vendor
Strategic asset?	No	Yes
Build cost vs. buy cost (3y TCO)?	Buy cheaper	Build cheaper

Most decisions have clear answers. The hard cases are where decisions split — strategic asset but vendor exists, or commodity but no good vendor.

What’s getting standardized

Three areas where the buy option is consolidating rapidly.

1. Foundation models

Three to five major providers (OpenAI, Anthropic, Google, plus regional/specialized). Use multiple; benchmark continuously.

2. Eval platforms

Two to four major eval/observability vendors. Pick based on integration with your stack.

3. Customer support AI

A handful of leaders (Intercom Fin, Decagon, Sierra, others). Buy from this market; don’t build.

What’s still fragmented

Areas where building remains common because the buy market is shallow.

Specialized industry agents (legal, healthcare, finance vertical apps).
Agent orchestration for complex workflows.
Domain-specific evals.
Internal-tool agents specific to your business.

What to do this quarter

Audit your current build/buy split. Most enterprises are over-building infrastructure.
Make the seven tooling decisions explicitly. Document your default and exceptions.
Plan the rebalance if you’re heavy-build. Migration takes 12–18 months.
Validate your wedges are in the build column. The build investment should concentrate there.

FAQ

Should we lock to one cloud’s AI offering? Mostly no. Cloud-native AI offerings improve fast but lock you to that cloud. Use cloud-native for some workloads; maintain flexibility on others.

What about AI infrastructure cost? Foundation model cost is dropping ~50% per year. Infrastructure cost (compute, storage, networking) is more stable. Plan for the cost mix to shift.

How does this work for data residency requirements? For EU, India, China data residency: build the infrastructure that supports it. May tilt toward more in-region cloud-native or on-premises.

What about AI on-premises? Mostly a regulated-industry concern. Self-hosted foundation models (open weights) are real options. Cost is higher; control is meaningful.

Should we use foundation models or fine-tune our own? For 95% of use cases: foundation models with good prompting and RAG. Fine-tuning is for specific situations where prompting can’t get the quality needed.

Working with JAIN on AI build/buy and tooling? We help executive teams make the seven tooling decisions and rebalance build/buy portfolios. Book a 30-minute call.

Related reading:

AI Vendor Selection: A Procurement Playbook