The AI Governance Framework to Put in Place Before You Scale

Q: How do we know our governance is working?

Three signals. New agent deployments hit the readiness bar before going live. Existing agents in production show stable eval scores and clean audit logs. When something goes wrong, the incident playbook actually gets used.

TL;DR

Five artifacts in priority order:

AI policy (week 1–2)
Eval and audit standard (week 3–6)
Autonomy approval frame (week 6–8)
Incident-response playbook (week 8–10)
Disparate-impact testing protocol (week 10–12, regulated only)

Most teams build them in the wrong order — starting with values statements and ethics committees instead of the working artifacts. Result: 6 months in, the program looks impressive on slides and produces nothing operational.

The 5 artifacts every governance program should have, and the order to build them in. Most teams build them in the wrong order.

The pattern at most companies starting an AI governance program: month 1 publishes principles, month 2 forms an ethics committee, month 3 commissions a vendor questionnaire, month 6 nothing operational has shipped. The team has invested time; the agents in production are still ungoverned. This piece is the right sequence — the artifacts that actually gate AI deployments, in the order to build them.

The five artifacts in detail

Week 1–2: AI policy

What it is: a 1,000-word document that says what AI use is approved, what isn’t, what data is OK to share, and how decisions get made. Specific examples; no abstract principles.

Why first: every other artifact references it. Without the policy, the supervision standards have no foundation to build on.

How to ship in 2 weeks: copy a working template (this article links to one), customize for your industry and jurisdictions, get sign-off from legal and security, publish.

Failure mode: trying to write a perfect policy. Ship the working version; revise quarterly.

Week 3–6: Eval and audit standard

What it is: the standards every agent in production must meet. Hand-labeled eval set, queryable audit log, drift monitoring, quarterly review cadence.

Why second: agents that go live without eval drift silently. The standard is the platform every subsequent agent needs.

How to ship: define the minimum bar, build a reference eval harness for one agent, document the pattern. Don’t try to retrofit every existing agent in 4 weeks; ship the standard and migrate over the next 2 quarters.

Failure mode: building too elaborate a standard. The minimum-viable version is enough; iterate.

Week 6–8: Autonomy approval frame

What it is: which autonomy levels are approved for which use cases, with what conditions. Covered in detail in The Autonomy Spectrum.

Why third: now that you have the eval standard, you can define what “ready for Level 3” actually means. Without the eval standard first, the autonomy frame is theoretical.

How to ship: document the levels, define the readiness criteria for each, get sign-off from CTO + CISO + relevant function leads.

Failure mode: making the levels too granular. Five levels is enough; more becomes bureaucracy.

Week 8–10: Incident-response playbook

What it is: the 7-step playbook from The AI Incident Response Playbook.

Why fourth: now that agents are deploying at higher autonomy, incidents become realistic. The playbook is the IR readiness.

How to ship: write the playbook, run a tabletop exercise, document the gaps, fix them.

Failure mode: writing a generic playbook. The playbook needs your specific names, contacts, systems, and procedures to be useful.

Week 10–12: Disparate-impact testing protocol

What it is: the quarterly testing protocol for any agent in regulated functions (HR, lending, healthcare, insurance). Includes the test methodology, data requirements, named owners, escalation procedure.

Why fifth: only relevant if you have regulated deployments. For non-regulated companies, this artifact may be deferred or skipped.

How to ship: align with the regulatory frame for your industry, document the test methodology, run the first test, schedule the recurring cadence.

Failure mode: treating this as a one-time exercise. It’s a recurring obligation; the protocol has to be sustainable.

The 12-week timeline

Most companies that follow this sequence have a working governance baseline 12 weeks after start. Compared to the 6-month “values statements and committees” approach, the working baseline is dramatically more useful.

Specific milestones:

End of week 4: AI policy published, eval standard drafted.
End of week 8: eval standard implemented for first agent, autonomy approval frame in place.
End of week 12: incident playbook tested, disparate-impact protocol running for regulated agents.

After the 12 weeks: maintenance and expansion. Quarterly review of each artifact. Annual update.

What you don’t need to build first

Three things that often consume early governance program time and shouldn’t.

An AI ethics committee. Useful eventually; not necessary in the first 12 weeks. Without working artifacts, the committee has nothing to review.

Comprehensive AI training for all employees. Useful but lower priority. Specific role-based training (engineers, ops, legal) matters more than a one-size-fits-all module.

A custom governance platform. Most companies need a working baseline first. Tooling investments make sense after 6+ months of operating the manual version.

What to do this quarter

Audit your current artifacts. Of the five, how many exist and are usable?
Pick the missing ones in priority order. Most companies need 2–4 of the five.
Set the 12-week timeline with a named owner. Without an owner, the work doesn’t happen.
Defer the optional artifacts. No ethics committee, no comprehensive training, no platform investment in the first 12 weeks.

FAQ

Can we shortcut the timeline? Modestly. With strong sponsorship and a senior owner, 8 weeks is achievable. Below that, the artifacts ship as theatre, not as working tools.

What if we already have a values-statement-style program? Keep what works. The five artifacts in this article are operational — they don’t replace values, they implement them. Add the operational layer to your existing program.

Who should own the governance program? The AI program lead, with explicit reporting line to the CTO or CIO. Reporting to legal or risk often produces compliance-focused governance that lacks operational teeth.

How do we know our governance is working? Three signals. (1) New agent deployments hit the readiness bar before going live. (2) Existing agents in production show stable eval scores and clean audit logs. (3) When something goes wrong, the incident playbook actually gets used.

Should we publish our governance framework externally? Increasingly yes for B2B companies. Customers ask; investors ask; regulators are starting to ask. A summary of your governance posture (without proprietary details) can be a marketing asset.

Working with JAIN on the 12-week governance program? We help executive teams ship the five artifacts in sequence and avoid the values-statement detour. Book a 30-minute call.

Related reading: