The AI Black Book · Book Five

Data Strategy in the Age of AI

Building the Retrieval-Ready Enterprise

Why enterprise data keeps failing AI systems, and the discipline that fixes it.

Published 2026 · Kindle & paperback

Grab it on Amazon All books

About the book

What this book is about.

Four books into this series, the architectural vocabulary is settled, the program-design discipline is written down, the security surface is mapped, and the business-model question has been posed at board altitude. A reader who has worked through those volumes can reason about agent design, cascade a step-gain roll-out, defend the production estate, and frame the reinvention question for a board that is ready to hear it. The program is running. The strategic framing is landed. And then, reliably, the data layer breaks the program anyway.

Most enterprise AI failures in 2024 and 2025 resolved to the data layer rather than to the model layer. The public record names a tribunal judgement against an airline whose chatbot fabricated policy, a municipal AI assistant advising small-business owners to break the law, a drive-through order-taking AI rolled back after three years of ambient-noise misorders, and a catalogue of pilot rollbacks inside firms whose BI dashboards had been reporting health for a decade. In each case the model was not obviously defective. The substrate was.

The gap between the BI regime the enterprise had optimised for and the retrieval regime the AI program now demanded is the subject of this book. The gap has a name, data debt, and it has a set of disciplines that reduce it. Those disciplines are the book's content.

Who it is for

The primary reader is the exec who owns the data layer under an AI program. The CDO who has to decide what to fund and in what order. The CIO whose data estate is about to be read by a consumer that will not protect itself with aggregation. The CTO whose production AI systems are the first to break when the data layer misses.

The secondary reader is the exec whose remit intersects the data layer without owning it centrally. The Chief AI Officer whose pilot portfolio is stalling on signals that point back at the substrate. The Chief Transformation Officer whose program timeline is hostage to a cleanup solving the previous decade's problem. The CFO who is about to approve another tranche of data-platform spend against metrics that will not survive a model-switch.

The book assumes its readers know what a large language model is, have read at least one board memo on AI, and have a working vocabulary for terms like agent, production gate, and shadow AI. It does not assume deep technical expertise.

What it is not

This is not a data-engineering textbook. No Kimball-dimensional-modeling rehash, no deep lakehouse-vs-warehouse taxonomy. The engineering literature is deep and the reader of this book is assumed to be supervising rather than writing ETL jobs.

This is not a vector-database product guide. Retrieval chapters keep the layer abstract; product comparisons live in Book 6, Buying and Building AI. This is not an ML-training-data textbook either; labeling and augmentation appear only where they cross into the generative and retrieval-first regime.

This is not a GDPR or HIPAA compliance checklist. Regulation appears as a governance overlay mapped to the controls introduced earlier in the book; checklists age and the book is written to survive two model generations.

What survives

Vector databases will consolidate between the writing and the reading. The retrieval-eval tooling most enterprises adopt in 2026 will have been displaced by the tooling of 2028. Foundation-model capabilities will close many of the gaps that look durable today, and open new ones that look unfamiliar now.

What survives is the reasoning. Whether a firm's data is retrieval-ready is not a function of which vector store it chose; it is a function of whether the capture regime, the quality discipline, the lineage architecture, and the measurement surface were built for a probabilistic consumer. A reader who internalises the reasoning and adapts the examples will still be making sensible data calls three vendor cycles from now.