The Council Pattern
Last reviewed
AdvancedWhat you'll learn
~18 min- Recognize when a change deserves independent multi-model review
- Run a diff through a council and read structured verdicts
- Weigh council cost against the cost of being confidently wrong
You have a schema migration staged. The agent wrote it, reviewed its own work, and says it is safe. Before you run it against production, you pipe the diff to three independent models. Two say yes. One says no — this drops a column with data in it.
That is the council pattern in one sentence: ask models that were not involved in writing the change to vote on whether it should ship.
One peer caught a trigger reference that was not in the diff. The agent that wrote the migration did not catch it — not because it is a bad model, but because it shared the blind spot with itself.
Why independent review works
A single model reviewing its own output has a structural problem: it inherits its own reasoning path. If it decided early that a column was unused, it will tend to re-confirm that belief when asked to double-check. This is not hallucination — it is the same confirmation bias that makes self-review weaker than peer review in any engineering process.
Independent models bring different training distributions, different prompt paths, and different tendencies toward caution. The research signal on multi-model deliberation is consistent: council-style review reports approximately a 36% relative reduction in hallucinated or incorrect claims compared to single-model self-review. The practical sweet spot is three to seven reviewers over roughly two rounds. Beyond seven peers or three rounds, marginal accuracy gains flatten while cost rises linearly. Two peers is often enough for a quick sanity check; one is not a council.
This matches how consequential decisions work in every high-stakes domain: peer review, second opinions, red teams. The pattern is not new. What is new is that it is now automatable in a pre-commit hook.
When to convene — and when not to
Convene a council when the cost of being confidently wrong is high:
- Schema migrations — dropped columns, renamed foreign keys, changed constraints
- Auth and security diffs — anything that touches tokens, session handling, permissions, or secrets
- Architecture decisions — new service dependencies, breaking API changes, deployment topology shifts
- Irreversible operations — data backfills, purges, vendor lock-in adoption
- “The agent is confidently insisting” moments — when the model has argued itself into a corner and is pushing hard for a direction you cannot fully evaluate
Do not convene a council for:
- Routine edits — a council on every commit is ceremony, not engineering
- Style and formatting changes
- Documentation updates
- Green-path feature work where a simple test covers correctness
The investment framing matters. A review mode council costs roughly $0.02–$0.10 in tokens and two to three minutes of wall time. A missed schema bug that reaches production can cost days of recovery, data repair, and incident retrospectives. The council is not overhead — it is insurance with a known premium and a calculable expected value.
Perplexity shipped Model Council as a mainstream user feature in February 2026. Council-style code review circulates as a first-class pattern in the Claude Code ecosystem. This course’s own production workflow uses multi-model deliberation before non-trivial changes merge to main. The pattern has left early-adopter territory.
Hands-on: running a diff review with llm-council
llm-council (MIT license, built and maintained by this course’s author, Intellimetrics) is one concrete implementation of the council pattern. It convenes your locally installed CLI tools — claude, codex, agy — and optionally OpenRouter or Ollama models as independent peers. The tool is the demo; the pattern is the lesson. Everything in this section is grounded in the README at github.com/Intellimetrics/llm-council.
Install
# Recommendeduv tool install --force git+https://github.com/Intellimetrics/llm-council.git
# Alternativepipx install --force git+https://github.com/Intellimetrics/llm-council.gitAfter install, run setup and diagnostics:
llm-council setup --yes --preset auto # detect installed CLIs, configure peersllm-council doctor # verify all peers are reachableThe auto preset discovers which CLIs you have installed. Other presets (tri-cli, openrouter, local-private) are covered below.
The estimate-before-running habit
Before any council run, get a cost estimate:
llm-council estimate --mode review --diff "Is this migration safe to run?"This shows token count and estimated USD across all peers without sending the diff. Build this into your workflow: estimate, decide whether the change is worth the cost, then run. A consensus mode estimate on a large diff might come back at $0.40 — that is the signal to either trim the diff or decide the change warrants the spend.
Review mode: the standard pre-commit gate
Stage your changes, then:
llm-council run --mode review --diff "Is this migration safe to run?"--diff reads from git diff --staged automatically. Each peer independently reads the diff and returns one of three structured verdicts:
RECOMMENDATION: yes # safe to proceedRECOMMENDATION: no # stop — major issues detectedRECOMMENDATION: tradeoff # plausible, but critical trade-offs notedVague answers are rejected at the protocol level. A peer that hedges without a verdict label is treated as a non-response. This forces peers to commit to a position — which is exactly what you want from a reviewer.
After the run, the full transcript is available:
llm-council lastReading a split verdict
A split result — the scenario in the terminal above — is not a failure. It is the council doing its job. When you get 1 yes / 1 tradeoff / 1 no, the correct response is not to average the votes. It is to read the no peer’s reasoning first. A single dissenting peer that names a specific artifact you did not consider has more informational value than two agreeing peers who did not catch it.
Weight dissent over agreement. Investigate every no and every tradeoff before proceeding. If the no peer is wrong, you will know after thirty seconds of reading — and you will have documented why you shipped anyway.
Trust properties
Read-only enforcement — the honest version
When peers run, write tools are disabled at invocation for claude and codex — hard-enforced at the API level. Those peers can read your diff and reason about it but cannot modify files, run commands, or make external requests.
For the agy (Antigravity) peer, the read-only constraint is enforced at the prompt level, not the API level. The practical risk is low for a focused diff review, but it is not the same guarantee. The README states this explicitly, and so does this lesson. If your codebase requires a hard write-isolation guarantee for all peers, use a preset that excludes agy from the council.
Secret scanning
Before any diff leaves your machine, the tool scans for common secret patterns — API keys, tokens, private key blocks. If a match is found, the run is blocked and you are shown which lines triggered the scan. This is a pre-flight check, not a substitute for .gitignore hygiene and a secrets manager, but it catches the most common accidental-commit scenario.
Cost caps
Add --max-cost-usd to any run to enforce a hard ceiling:
llm-council run --mode consensus --diff --max-cost-usd 0.50 "Should we merge this auth rewrite?"If the estimated cost exceeds the cap, the run is blocked. Useful in CI environments where you want council coverage but not unbounded spend.
If your code cannot leave your machine — classified systems, HIPAA-regulated data, proprietary algorithms — use the local-private preset, which routes all peers through Ollama models running locally. No tokens leave the machine. Quality is lower than frontier models, but the pattern still works: independent local models catch different failure modes than the single agent that wrote the change.
Setup: llm-council setup --yes --preset local-private after installing Ollama with at least two different model families pulled.
Consensus mode: when yes/no is not enough
Review mode gives you a verdict. Consensus mode gives you a debate.
llm-council run --mode consensus --diff "Should we merge this auth rewrite?"In consensus mode, peers are assigned opposing stances at the start — one argues for merging, one argues against, one plays skeptic. They exchange structured responses over approximately two rounds. The output is not a single verdict but a transcript of positions and rebuttals, surfacing the strongest arguments on each side. Use it when you need to understand the shape of the risk, not just whether to proceed: architecture decisions, security rewrites, vendor adoptions where the correct answer is genuinely uncertain.
Consensus mode costs more — typically two to four times a review run. Estimate first.
As an MCP server
If you run llm-council as an MCP server, your main agent can call it mid-session without leaving the terminal:
council_run — run a council synchronouslycouncil_estimate — estimate cost before runningcouncil_recommend — get a single aggregated recommendationcouncil_doctor — verify peer connectivitycouncil_last_transcript — retrieve the previous runThis enables a workflow where the agent completes a dangerous operation, pauses, calls council_run on the staged diff, and waits for a consensus before proceeding to commit. The human is still in the loop for the final decision, but the council has already surfaced the second opinion.
Your agent has staged a change that renames a foreign key column across five tables. When should you convene a council?
The council returns: Peer 1 — yes, Peer 2 — yes, Peer 3 — no (names a specific trigger you didn't know existed). What is the correct next step?
What does 'read-only' mean for the claude and codex peers, versus the agy peer?
Key takeaways
- The blind-spot argument: a model reviewing its own work inherits its own reasoning path. Independent peers catch different failure modes.
- When to convene: schema migrations, auth/security diffs, architecture decisions, irreversible operations, and confident-agent moments. Routine edits do not need a council.
- Investment framing: a review run costs $0.02–$0.10 and two minutes. A production incident costs days. The council is insurance with a calculable premium.
- Estimate before running:
llm-council estimate --mode review --diffbefore everyrun. Make it a reflex. - Verdicts are structured:
yes,no, ortradeoff— no hedging accepted. Weight dissent over agreement; investigate everyno. - Read-only enforcement is not uniform: hard at the API level for
claude/codex, prompt-level only foragy. State this honestly in runbooks. - Consensus mode (
--mode consensus) runs a structured multi-round debate when you need to understand the shape of the risk, not just a go/no-go. - As an MCP server, the council is callable mid-session — your agent can pause, take the current diff to council, and wait for a verdict before committing.
Up next: Lesson 9 — Agent Runbooks — making these patterns repeatable across teams by encoding them as executable operational documents.