The AI Stack Most Enterprises Should Run

Q: What about RAG-specific infrastructure?

Vector databases, retrieval frameworks. Buy from established vendors (Pinecone, Weaviate, Qdrant) or use cloud-native vector services.

Q: How does this map to cloud-provider AI offerings?

Cloud-provider offerings cover several layers. Use selectively; don't lock everything to one cloud.

TL;DR

The reference enterprise AI stack in 2026:

Layer	Components	Build/Buy
Foundation models	Multi-model (closed + open)	Buy
Model gateway	Routing, auth, rate limit	Buy
Tool catalog	MCP-compatible	Build
Agent framework	Light libraries + custom orchestration	Mostly buy
Eval and observability	Vendor platforms	Buy
Audit and governance	Specialized + custom	Mix
Productivity AI	Vendor products	Buy
Customer-facing AI	Wedge-specific (build differentiating, buy commodity)	Mix
Industry-specific AI	Vendor products + custom	Mix

The shape: heavy buy at infrastructure layers, custom work concentrated at differentiating capabilities and tool catalog.

The reference stack for mid-large enterprises in 2026. The shape: heavy buy at the bottom, build for differentiation at the top.

The “what AI stack should we run” question gets answered differently by every consultant. The 2026 working pattern at companies executing AI well is consistent enough to describe. This piece is the reference architecture with the build/buy distribution.

The reference stack

Layer 1: Foundation models

Multi-model from day one. Mix of:

2–3 frontier closed models (Claude, GPT, Gemini).
1–2 open models for specific use cases.
Specialized smaller models for narrow tasks (embeddings, classification).

Cost: scales with usage. For typical mid-large enterprise: $1M–$10M annually in foundation model spend.

Layer 2: Model gateway and routing

The layer that connects agents to foundation models. Functions:

Authentication.
Rate limiting and cost management.
Model routing per use case.
Fallback handling.
Usage tracking.

Buy: LiteLLM, OpenRouter, Vercel AI Gateway, or cloud-native (Bedrock, Azure AI Studio).

Cost: $50K–$500K annually depending on scale and vendor.

Layer 3: Tool catalog and MCP infrastructure

Tools that agents can invoke. MCP-compatible architecture for future-proofing.

Build: most tool integrations. Internal data and systems are specific.

Buy: MCP server frameworks; vendor MCP servers for common systems.

Cost: 1–3 FTEs to build and maintain catalog.

Layer 4: Agent framework and orchestration

The framework agents are built in. The 2026 working pattern: light libraries + custom orchestration, not heavy frameworks.

Buy: model SDKs (OpenAI, Anthropic SDKs), Pydantic, structured-output libraries.

Build: custom orchestration following internal patterns.

Cost: included in AI engineering team capacity.

Layer 5: Eval and observability

Eval platform plus production observability. Often combined vendor offering.

Buy: Braintrust, Langfuse, LangSmith, or similar.

Cost: $100K–$500K annually for typical enterprise.

Layer 6: Audit and governance

Audit logging + governance tooling. Specific to your regulatory and operational needs.

Mix: vendor for audit platform (or build for specialized needs); custom integration with internal governance processes.

Cost: 1–2 FTEs operating; $50K–$200K platform cost.

Layer 7: Productivity AI

Tools your employees use for productivity. Coding assistants, writing assistants, research tools.

Buy: vendor products. Don’t build.

Cost: $20–$50/user/month for major productivity AI tools. For 10K-employee enterprise: $2.4M–$6M annually.

Layer 8: Customer-facing AI

AI in your products. Mix of differentiating (build) and commodity (buy).

Build: the 1–3 wedges that differentiate your product.

Buy: commodity capabilities (customer support, generic personalization, generic search).

Cost: highly variable; dominant cost item for AI-active enterprises.

Layer 9: Industry-specific AI

For regulated and specialized use cases (clinical AI, banking-specific AI, etc.).

Mix: industry vendor products + custom integration + governance overlay.

Cost: industry-specific.

The total cost picture

For a typical 10K-employee mid-large enterprise in 2026:

Layer	Annual cost (range)
Foundation models	$1M–$10M
Gateway/routing	$50K–$500K
Tool catalog	$300K–$1M (people)
Agent framework	included
Eval/observability	$100K–$500K
Audit/governance	$200K–$700K
Productivity AI	$2M–$6M
Customer-facing AI	$2M–$20M+
Industry-specific AI	varies
AI engineering team	$3M–$15M

Total annual AI spend: $10M–$50M+ for a mid-large enterprise. Scales with strategic ambition and customer-facing AI investment.

What’s commonly wrong

Three mistakes I see often.

Mistake 1: Heavy investment in custom infrastructure

Building custom gateways, custom eval platforms, custom observability. Vendor solutions are mature; building reproduces commodity capability.

Mistake 2: Light investment in tool catalog

Treating tool integration as one-off engineering work rather than strategic infrastructure. Result: agents can’t access the tools they need; new agent development is slow.

Mistake 3: Single framework lock-in

Heavy commitment to one agent framework that becomes a constraint as use cases mature.

What to do this quarter

Audit your stack against the reference. Where does your stack diverge?
Identify infrastructure over-investment. Custom builds that should be replaced by vendor.
Identify tool catalog under-investment. Probably the biggest gap at most enterprises.
Plan the rebalance. Migrations take 12–18 months.

How the stack will evolve through 2027

Three predictions.

1. Vendor consolidation at infrastructure layers. 2–4 major vendors per layer; less fragmentation.

2. Standardization of MCP and tool catalog. Tool ecosystem stabilizes; build effort focuses on internal tools.

3. Open models in more layers. Self-hosted or third-party-hosted open models in routing alongside closed.

FAQ

What about RAG-specific infrastructure? Vector databases, retrieval frameworks. Significant for some use cases; lighter for others. Buy from established vendors (Pinecone, Weaviate, Qdrant, etc.) or use cloud-native (Vertex, Bedrock vector services).

What about fine-tuning infrastructure? For most enterprises, foundation model providers’ fine-tuning offerings are sufficient. Self-hosted fine-tuning infrastructure rarely justified.

How does this differ for AI-native vs. traditional companies? AI-native companies build deeper at most layers. Traditional companies should buy more aggressively. The reference is calibrated to traditional enterprise.

What about regulated industries? Add compliance overlays at audit/governance and industry-specific layers. May reduce buy options at some layers.

How does this map to cloud-provider AI offerings? Cloud-provider offerings cover several layers (gateway, eval in some cases, foundation models). Use selectively; don’t lock everything to one cloud.

Working with JAIN on AI stack architecture? We help executive teams design and execute the reference stack with appropriate customization. Book a 30-minute call.

Related reading: