Draft - Agentic Governance

The Agent Needs a Manager

The next enterprise AI problem is not whether agents can do more work. It is whether companies know how to delegate, supervise, evaluate, and recover that work when agents start acting across real systems.

June 6, 2026

9 min read

The Agentic Practice

A human operator managing AI agent runs through scoped permissions, approval gates, evaluation signals, audit trails, rollback controls, and ownership markers.

Agents need job descriptions, not just prompts.

Approval is not the same as governance.

The companies that win will manage delegated work better, not merely deploy more agents.

The first phase of enterprise AI was about assistance.

Ask a question. Summarize a document. Draft an email. Generate a campaign idea. Rewrite a paragraph. Search a knowledge base. Explain a policy. Help me think.

That phase mattered because it made AI feel useful. It put intelligence close to everyday work.

But assistance is not the end state.

The next phase is delegation.

The difference is simple:

  • An assistant helps you think.
  • An agent does work on your behalf.

That one shift changes the operating model. The problem is no longer just prompt quality, model capability, or content retrieval. It becomes a management problem.

Who gave the agent authority? What was it asked to accomplish? What systems could it touch? What evidence did it use? What decisions did it make? Where did it stop? Who reviewed the result? What happens if the result is wrong?

Enterprise AI is moving from software you use to software you manage.

And if software starts behaving like a worker, then the agent needs a manager.

Chat Made AI Accessible. Action Makes It Organizational.

A conversational AI interface expanding into connected business actions, records, tasks, approvals, and system nodes.
A conversational AI interface expanding into connected business actions, records, tasks, approvals, and system nodes.

Chat was the right starting point.

It gave people a familiar way to interact with models. It let teams explore use cases without changing the underlying systems of record. It made AI feel low-risk because the model mostly produced text and the human decided what to do next.

That created a useful illusion: AI was powerful, but contained.

Agents break that illusion.

An agent can plan, call tools, retrieve information, update records, create tickets, route approvals, draft responses, schedule actions, compare vendors, assemble deliverables, monitor changes, and continue across multiple steps.

That is not just a better chat experience. That is a form of delegated labor.

The moment an agent can change something outside the conversation, the design question changes.

It is no longer enough to ask:

  • Can the model answer?
  • Can the agent call a tool?
  • Can the workflow complete?

The better questions are:

  1. What job is this agent actually responsible for?
  2. What authority does it have?
  3. What evidence must it collect before acting?
  4. What decisions require escalation?
  5. What gets logged?
  6. What can be undone?
  7. Who owns the outcome?

Those are management questions.

Approval Is Not Governance.

A single approval checkpoint expanding into a governed workflow with evidence, authority, audit, escalation, evaluation, and rollback signals.
A single approval checkpoint expanding into a governed workflow with evidence, authority, audit, escalation, evaluation, and rollback signals.

Most companies will try to solve this with approval buttons.

That is understandable. Approval feels familiar. It gives a human a checkpoint before the system acts. It creates a sense of control.

But approval alone is not governance.

A human can approve a bad recommendation if the evidence is incomplete. A manager can rubber-stamp a workflow if the request looks routine. A reviewer can miss a material risk if the agent's reasoning is hidden inside a long transcript. A team can approve an action without knowing what context the agent ignored.

Approval is a moment.

Governance is a system.

A governed agentic workflow needs more than a yes-or-no checkpoint. It needs the conditions that make review meaningful:

  • Scope: the agent should know what work it is allowed to perform.
  • Identity: the system should know which human, team, or policy delegated the work.
  • Authority: permissions should match the task, risk, and business object.
  • Evidence: the agent should show the facts, sources, assumptions, and missing context behind the recommendation.
  • Evaluation: the work should be checked against objective criteria, not only human vibes.
  • Escalation: uncertainty, policy conflicts, high-risk actions, and low-confidence outputs should route to the right person.
  • Audit: the system should preserve what happened, what changed, why it changed, and who approved it.
  • Recovery: mistakes should have rollback paths, exception handling, and post-run review.

Without those pieces, approval becomes theater.

It makes people feel like the system is controlled while leaving the real risk untouched.

Interactive · The management loop

Run the same delegation both ways.

task: refresh competitor pricing brief
  1. 1

    Request

    “Refresh the competitor pricing brief.”

  2. 2

    Approve

    A human clicks yes. This is the only control.

  3. 3

    Agent runs

    Hours of unobserved work. No checkpoints, no review.

  4. 4

    Output lands

    Quality unknown. Sources unknown. Nobody watched the middle.

Press a run button to watch how each model treats the hours between “go” and “done.”

Agents Need Job Descriptions.

A structured AI agent role card surrounded by scoped authority, trusted inputs, evidence standards, evaluation, escalation, and ownership markers.
A structured AI agent role card surrounded by scoped authority, trusted inputs, evidence standards, evaluation, escalation, and ownership markers.

A human worker does not start with unlimited authority.

They get a role, a manager, a team, a set of responsibilities, access to systems, review norms, escalation paths, performance expectations, and consequences for mistakes.

Agents need the same kind of structure.

Not because agents are people. They are not.

Because delegated work requires boundaries.

An enterprise agent should have something closer to a job description than a prompt. That job description should define:

  1. Purpose: what outcome the agent exists to produce.
  2. Scope: which tasks, objects, systems, and decisions are in bounds.
  3. Authority: what it can do automatically, what it can recommend, and what it can never do.
  4. Inputs: which context sources, records, policies, and tools it can use.
  5. Evidence standard: what must be shown before a recommendation or action is valid.
  6. Evaluation standard: how success, quality, compliance, and completeness are measured.
  7. Escalation path: when the agent should stop and involve a human.
  8. Owner: who is accountable for maintaining the agent and the business outcome it supports.

This is where many agent pilots will break.

They will be designed as demos instead of roles.

The demo question is: can the agent perform the task?

The role question is: can the agent perform the task repeatedly, within policy, under changing conditions, with a clear owner, reliable evidence, and recoverable failures?

Those are very different questions.

Agent Sprawl Is the New SaaS Sprawl.

Many AI agents scattered across enterprise systems with overlapping permissions, fragmented ownership, and operational risk signals.
Many AI agents scattered across enterprise systems with overlapping permissions, fragmented ownership, and operational risk signals.

Every company has seen this movie before.

First, a new tool is exciting. Then every team adopts its own version. Then the company wakes up with overlapping systems, inconsistent permissions, duplicated workflows, unclear ownership, and fragmented data.

That was SaaS sprawl.

Agent sprawl will be worse.

SaaS apps usually wait for humans to act. Agents can act across systems. They can generate work, trigger workflows, route information, modify records, and create downstream effects.

That means the cost of sprawl is not only subscription waste or messy administration. It is operational confusion.

Imagine a large company with hundreds or thousands of agents:

  • A sales agent updates opportunity notes.
  • A support agent drafts customer responses.
  • A finance agent flags payment anomalies.
  • A legal agent redlines clauses.
  • A marketing agent generates campaign variants.
  • A procurement agent compares vendors.
  • A product agent summarizes customer feedback.
  • A security agent opens incidents.

Individually, each agent may seem useful.

Collectively, the company now has a new workforce layer.

If that layer has no registry, no lifecycle, no evaluation history, no owner map, no permission model, no incident process, and no shared audit trail, the company will not know what it has deployed.

The agent registry becomes the new org chart.

Not because agents replace employees, but because agents become delegated work capacity. They need to be discoverable, governed, measured, improved, and sometimes retired.

The Management Layer Is the Product.

A governed AI operations workspace showing agent directories, run timelines, permission scopes, evidence panels, approvals, exceptions, and rollback controls.
A governed AI operations workspace showing agent directories, run timelines, permission scopes, evidence panels, approvals, exceptions, and rollback controls.

This is why the next enterprise AI interface will not just be a better chat box.

It will be a management layer.

That layer will let teams see:

  • What agents exist.
  • What jobs they perform.
  • What systems they can touch.
  • What data they can access.
  • What permissions they have.
  • What runs are in progress.
  • What decisions are pending.
  • What evidence supports each recommendation.
  • What evaluations passed or failed.
  • What risks were escalated.
  • What changed after approval.
  • What needs rollback, retraining, or retirement.

This is where agentic products will start to separate.

The weak version is a sidebar with tool calls.

The strong version is a governed work surface where humans manage delegated outcomes.

That surface will need patterns that feel more like operations management than chat:

  • Agent directories.
  • Run timelines.
  • Permission scopes.
  • Evidence panels.
  • Evaluation scorecards.
  • Exception queues.
  • Approval chains.
  • Rollback controls.
  • Change logs.
  • Human ownership.
  • Policy-aware routing.

In other words, the product surface has to answer the management questions that appear once AI starts acting.

Evaluation Becomes Part of the Workflow.

An agentic workflow with evaluation checkpoints embedded across input, planning, evidence, tool use, decisions, outcomes, risk handling, and recovery.
An agentic workflow with evaluation checkpoints embedded across input, planning, evidence, tool use, decisions, outcomes, risk handling, and recovery.

A lot of teams still treat evaluation as something that happens before deployment.

That will not be enough for agents.

Agents operate in changing environments. They receive new context. They encounter edge cases. They call tools. They make intermediate decisions. They interact with systems that may have stale data, missing permissions, ambiguous policy, or conflicting instructions.

For agentic work, evaluation has to become part of the workflow itself.

A useful evaluation layer should check:

  1. Input quality: did the agent receive enough context to act?
  2. Plan quality: did the proposed steps match the goal and policy?
  3. Evidence quality: were claims supported by reliable sources?
  4. Tool use: were the right systems called in the right order?
  5. Decision quality: did the agent make or recommend the right judgment?
  6. Outcome quality: did the final work meet the business standard?
  7. Risk handling: did the agent escalate at the right time?
  8. Recovery: if something failed, did the system preserve enough state to fix it?

This is the shift from model evaluation to work evaluation.

The model matters. But the work matters more.

A mediocre model inside a well-scoped, well-evaluated, well-governed workflow may be safer and more useful than a powerful model operating with vague authority and no reliable review path.

The Human Role Moves Upstream.

A team defining outcomes, trusted sources, evidence standards, escalation gates, and ownership before agent workflows execute downstream.
A team defining outcomes, trusted sources, evidence standards, escalation gates, and ownership before agent workflows execute downstream.

The lazy version of the agent story says humans approve outputs.

That is too small.

The more important human role is upstream.

Humans define the work.

They decide which outcomes matter, what good looks like, which tradeoffs are acceptable, which risks require escalation, which sources are trusted, which systems can be changed, and which exceptions need judgment.

Then they manage the system as it runs.

That changes the skill profile of knowledge work.

The valuable worker is not only the person who can perform the task. It is the person who can design the journey, encode the context, define the evidence standard, supervise the agent, interpret exceptions, and own the result.

This is why "agent manager" should not be read as a new job title only.

It is a new layer of work.

Managers, operators, analysts, marketers, sellers, engineers, lawyers, finance teams, support leaders, and product owners will all need versions of this skill.

The question will not be:

Did you use AI?

It will be:

Did you know how to delegate the work safely?

The Companies That Win Will Not Have the Most Agents.

A contrast between scattered unmanaged agent nodes and a smaller governed delegated-work operating model producing reliable outcomes.
A contrast between scattered unmanaged agent nodes and a smaller governed delegated-work operating model producing reliable outcomes.

It will be tempting to measure progress by agent count.

How many agents did we deploy? How many workflows did we automate? How many tasks did we remove? How many hours did we save?

Those questions are useful, but they can push teams toward the wrong behavior.

The goal is not more agents.

The goal is better delegated work.

The companies that win will know which work should be delegated, which work should be augmented, which work should stay human, and which work should not happen at all.

They will build a management layer before agent sprawl becomes unmanageable.

They will give agents clear job descriptions, scoped authority, reliable context, observable runs, meaningful evaluations, recoverable failure paths, and accountable owners.

They will treat governance as part of the product experience, not a compliance document added after deployment.

And they will understand the deeper shift:

AI is not only changing how work gets done.

It is changing how work gets managed.

That is the next enterprise AI problem.

Not just better prompts.

Not just smarter models.

Not just more copilots.

The agent needs a manager.

Interactive · Ask this article

Every essay here answers for itself.

POST /askA natural-language interface over this essay — answers assembled from its own sections.