The Council Pattern

You have a schema migration staged. The agent wrote it, reviewed its own work, and says it is safe. Before you run it against production, you pipe the diff to three independent models. Two say yes. One says no — this drops a column with data in it.

That is the council pattern in one sentence: ask models that were not involved in writing the change to vote on whether it should ship.

Claude Code — Council review of a schema migration

/home/user/api-service $ llm-council estimate --mode review --diff "Is this migration safe to run?"

Estimated cost: $0.031 across 3 peers Tokens: ~8,400 input, ~1,200 output Continue? (y/n): y

/home/user/api-service $ llm-council run --mode review --diff "Is this migration safe to run?"

Reading diff (git diff --staged)... Peer 1 (claude) RECOMMENDATION: yes Reasoning: Column rename is backward-compatible. Application code updated in same commit. Index preserved. Peer 2 (codex) RECOMMENDATION: tradeoff Reasoning: Safe if deployed atomically with app code. Rolling deploy without feature flag risks 503s during transition window. Peer 3 (agy) RECOMMENDATION: no Reasoning: Line +14 drops `user_legacy_id`. Column is referenced in audit_log trigger (not in this diff). Data loss on next migration run. ───────────────────────────────────── Verdict summary: 1 yes / 1 tradeoff / 1 no Council did not reach consensus. Investigate the dissent before committing. Run `llm-council last` to view full transcript.

/home/user/api-service $

One peer caught a trigger reference that was not in the diff. The agent that wrote the migration did not catch it — not because it is a bad model, but because it shared the blind spot with itself.

Why independent review works

A single model reviewing its own output has a structural problem: it inherits its own reasoning path. If it decided early that a column was unused, it will tend to re-confirm that belief when asked to double-check. This is not hallucination — it is the same confirmation bias that makes self-review weaker than peer review in any engineering process.

Independent models bring different training distributions, different prompt paths, and different tendencies toward caution. The research signal on multi-model deliberation is consistent: council-style review reports approximately a 36% relative reduction in hallucinated or incorrect claims compared to single-model self-review. The practical sweet spot is three to seven reviewers over roughly two rounds. Beyond seven peers or three rounds, marginal accuracy gains flatten while cost rises linearly. Two peers is often enough for a quick sanity check; one is not a council.

This matches how consequential decisions work in every high-stakes domain: peer review, second opinions, red teams. The pattern is not new. What is new is that it is now automatable in a pre-commit hook.

When to convene — and when not to

Convene a council when the cost of being confidently wrong is high:

Schema migrations — dropped columns, renamed foreign keys, changed constraints
Auth and security diffs — anything that touches tokens, session handling, permissions, or secrets
Architecture decisions — new service dependencies, breaking API changes, deployment topology shifts
Irreversible operations — data backfills, purges, vendor lock-in adoption
“The agent is confidently insisting” moments — when the model has argued itself into a corner and is pushing hard for a direction you cannot fully evaluate

Do not convene a council for:

Routine edits — a council on every commit is ceremony, not engineering
Style and formatting changes
Documentation updates
Green-path feature work where a simple test covers correctness

The investment framing matters. A review mode council costs roughly $0.02–$0.10 in tokens and two to three minutes of wall time. A missed schema bug that reaches production can cost days of recovery, data repair, and incident retrospectives. The council is not overhead — it is insurance with a known premium and a calculable expected value.

ℹIndustry signal

Perplexity shipped Model Council as a mainstream user feature in February 2026. Council-style code review circulates as a first-class pattern in the Claude Code ecosystem. This course’s own production workflow uses multi-model deliberation before non-trivial changes merge to main. The pattern has left early-adopter territory.

Hands-on: running a diff review with llm-council

llm-council (MIT license, built and maintained by this course’s author, Intellimetrics) is one concrete implementation of the council pattern. It convenes your locally installed CLI tools — claude, codex, agy — and optionally OpenRouter or Ollama models as independent peers. The tool is the demo; the pattern is the lesson. Everything in this section is grounded in the README at github.com/Intellimetrics/llm-council.

Install

# Recommended
uv tool install --force git+https://github.com/Intellimetrics/llm-council.git

# Alternative
pipx install --force git+https://github.com/Intellimetrics/llm-council.git

After install, run setup and diagnostics:

llm-council setup --yes --preset auto   # detect installed CLIs, configure peers
llm-council doctor                       # verify all peers are reachable

The auto preset discovers which CLIs you have installed. Other presets (tri-cli, openrouter, local-private) are covered below.

The estimate-before-running habit

Before any council run, get a cost estimate:

llm-council estimate --mode review --diff "Is this migration safe to run?"

This shows token count and estimated USD across all peers without sending the diff. Build this into your workflow: estimate, decide whether the change is worth the cost, then run. A consensus mode estimate on a large diff might come back at $0.40 — that is the signal to either trim the diff or decide the change warrants the spend.

Review mode: the standard pre-commit gate

Stage your changes, then:

llm-council run --mode review --diff "Is this migration safe to run?"

--diff reads from git diff --staged automatically. Each peer independently reads the diff and returns one of three structured verdicts:

RECOMMENDATION: yes        # safe to proceed
RECOMMENDATION: no         # stop — major issues detected
RECOMMENDATION: tradeoff   # plausible, but critical trade-offs noted

Vague answers are rejected at the protocol level. A peer that hedges without a verdict label is treated as a non-response. This forces peers to commit to a position — which is exactly what you want from a reviewer.

After the run, the full transcript is available:

llm-council last

Reading a split verdict

A split result — the scenario in the terminal above — is not a failure. It is the council doing its job. When you get 1 yes / 1 tradeoff / 1 no, the correct response is not to average the votes. It is to read the no peer’s reasoning first. A single dissenting peer that names a specific artifact you did not consider has more informational value than two agreeing peers who did not catch it.

Weight dissent over agreement. Investigate every no and every tradeoff before proceeding. If the no peer is wrong, you will know after thirty seconds of reading — and you will have documented why you shipped anyway.

Trust properties

Read-only enforcement — the honest version

When peers run, write tools are disabled at invocation for claude and codex — hard-enforced at the API level. Those peers can read your diff and reason about it but cannot modify files, run commands, or make external requests.

For the agy (Antigravity) peer, the read-only constraint is enforced at the prompt level, not the API level. The practical risk is low for a focused diff review, but it is not the same guarantee. The README states this explicitly, and so does this lesson. If your codebase requires a hard write-isolation guarantee for all peers, use a preset that excludes agy from the council.

Secret scanning

Before any diff leaves your machine, the tool scans for common secret patterns — API keys, tokens, private key blocks. If a match is found, the run is blocked and you are shown which lines triggered the scan. This is a pre-flight check, not a substitute for .gitignore hygiene and a secrets manager, but it catches the most common accidental-commit scenario.

Cost caps

Add --max-cost-usd to any run to enforce a hard ceiling:

llm-council run --mode consensus --diff --max-cost-usd 0.50 "Should we merge this auth rewrite?"

If the estimated cost exceeds the cap, the run is blocked. Useful in CI environments where you want council coverage but not unbounded spend.

💡Sensitive codebases: local-only option

If your code cannot leave your machine — classified systems, HIPAA-regulated data, proprietary algorithms — use the local-private preset, which routes all peers through Ollama models running locally. No tokens leave the machine. Quality is lower than frontier models, but the pattern still works: independent local models catch different failure modes than the single agent that wrote the change.

Setup: llm-council setup --yes --preset local-private after installing Ollama with at least two different model families pulled.

Consensus mode: when yes/no is not enough

Review mode gives you a verdict. Consensus mode gives you a debate.

llm-council run --mode consensus --diff "Should we merge this auth rewrite?"

In consensus mode, peers are assigned opposing stances at the start — one argues for merging, one argues against, one plays skeptic. They exchange structured responses over approximately two rounds. The output is not a single verdict but a transcript of positions and rebuttals, surfacing the strongest arguments on each side. Use it when you need to understand the shape of the risk, not just whether to proceed: architecture decisions, security rewrites, vendor adoptions where the correct answer is genuinely uncertain.

Consensus mode costs more — typically two to four times a review run. Estimate first.

As an MCP server

If you run llm-council as an MCP server, your main agent can call it mid-session without leaving the terminal:

council_run       — run a council synchronously
council_estimate  — estimate cost before running
council_recommend — get a single aggregated recommendation
council_doctor    — verify peer connectivity
council_last_transcript — retrieve the previous run

This enables a workflow where the agent completes a dangerous operation, pauses, calls council_run on the staged diff, and waits for a consensus before proceeding to commit. The human is still in the loop for the final decision, but the council has already surfaced the second opinion.

KNOWLEDGE CHECK

Your agent has staged a change that renames a foreign key column across five tables. When should you convene a council?

KNOWLEDGE CHECK

The council returns: Peer 1 — yes, Peer 2 — yes, Peer 3 — no (names a specific trigger you didn't know existed). What is the correct next step?

KNOWLEDGE CHECK

What does 'read-only' mean for the claude and codex peers, versus the agy peer?

Key takeaways

The blind-spot argument: a model reviewing its own work inherits its own reasoning path. Independent peers catch different failure modes.
When to convene: schema migrations, auth/security diffs, architecture decisions, irreversible operations, and confident-agent moments. Routine edits do not need a council.
Investment framing: a review run costs $0.02–$0.10 and two minutes. A production incident costs days. The council is insurance with a calculable premium.
Estimate before running: llm-council estimate --mode review --diff before every run. Make it a reflex.
Verdicts are structured: yes, no, or tradeoff — no hedging accepted. Weight dissent over agreement; investigate every no.
Read-only enforcement is not uniform: hard at the API level for claude/codex, prompt-level only for agy. State this honestly in runbooks.
Consensus mode (--mode consensus) runs a structured multi-round debate when you need to understand the shape of the risk, not just a go/no-go.
As an MCP server, the council is callable mid-session — your agent can pause, take the current diff to council, and wait for a verdict before committing.

Up next: Lesson 9 — Agent Runbooks — making these patterns repeatable across teams by encoding them as executable operational documents.

What you'll learn

Why independent review works

When to convene — and when not to

Hands-on: running a diff review with llm-council

Install

The estimate-before-running habit

Review mode: the standard pre-commit gate

Reading a split verdict

Trust properties

Read-only enforcement — the honest version

Secret scanning

Cost caps

Consensus mode: when yes/no is not enough

As an MCP server

Key takeaways