Mastery Module 19 · Agentic Engineering

Spec-Driven Development

Last reviewed

Advanced

What you'll learn

~18 min
  • Decide when a feature deserves a spec instead of a conversation
  • Run the Spec Kit workflow from constitution to implementation
  • Use the spec as the agent-legible source of truth across sessions

Two developers. Same feature request: add CSV export to the reporting dashboard.

Developer A opens a chat and starts typing. Forty messages and two days later, the feature half-works, the schema differs from what the backend team agreed on, and nobody can remember which constraints were decided versus suggested. The agent implemented what the last message said, not what the project actually needs. Onboarding a second developer means scrolling a chat log.

Developer B runs /speckit.specify. In twenty minutes, she has a reviewed spec.md in version control — a document that answers “what are we building and why” well enough that any agent, any teammate, and future-her can re-read it and pick up exactly where the last session left off. She runs /speckit.plan, reviews the technical approach, runs /speckit.tasks, and the agent starts building against a stable target.

This is the difference between vibe coding at scale and spec-driven development.

Why specs beat conversations for big work

Module 10, Lesson 1 introduced the 60-second planning exercise — answer “what am I building, for whom, and what does done look like” before opening the terminal. Spec-driven development is that discipline systematized into committed artifacts.

Three properties make a spec better than a conversation thread for any feature that spans more than one session:

1. Context windows expire; files do not. A conversation is ephemeral. As Lesson 2 explains, quality degrades as the window fills, and a new session starts with zero memory of the old one. A spec file committed to the repo is re-read on every new session. The agent’s working memory becomes the file, not the chat scrollback.

2. Specs are team-reviewable. A chat thread is a single-player medium. A spec.md in a PR is a first-class artifact your team can comment on, approve, or push back on — before any code is written. Design disagreements surface at the cheapest possible moment.

3. Specs are agent-legible. Markdown with structured sections (goals, constraints, user stories, non-goals) is exactly what a language model reads well. Telling the agent @specs/042-csv-export/spec.md at the start of a session is more reliable than paraphrasing requirements in a new message and hoping the model reconstructs your intent correctly.

The Spec Kit workflow

Spec Kit (GitHub; ~111k stars; 30+ agents supported including Claude Code, Codex CLI, Antigravity CLI, Cursor, and Copilot) provides a set of slash commands that guide you through spec-driven development phase by phase. The commands below are verified against the project README as of June 2026.

Installation

Spec Kit is installed as a CLI extension via uv:

Terminal window
uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z

Replace vX.Y.Z with the latest release tag shown on the GitHub releases page. Once installed, the /speckit.* commands are available inside Claude Code, Codex CLI, Antigravity CLI, and the other supported agents.

The workflow runs in seven phases. Here is each one illustrated against the running example — adding CSV export to a reporting dashboard.


Phase 1 — Establish principles: /speckit.constitution

What it does: Creates or updates .specify/memory/constitution.md — a project-level document capturing governing principles: tech stack, architectural constraints, non-negotiables, team conventions.

Artifact: .specify/memory/constitution.md

You run this once when Spec Kit is introduced to a project, and update it when foundational decisions change. The constitution is read by every subsequent command; it ensures the plan the agent produces is consistent with the project’s actual constraints.

/speckit.constitution

The agent prompts you with questions about your stack, conventions, and hard constraints. For a Node/TypeScript project: “TypeScript strict mode, no any; REST API using { data, error, meta } envelope; all monetary values in cents as integers; never touch src/legacy/v1-compat.ts.”


Phase 2 — Write the spec: /speckit.specify

What it does: Guides a structured conversation to produce a requirements document — the what and why of the feature, including user stories, acceptance criteria, and explicit non-goals.

Artifact: specs/{number}-{name}/spec.md

/speckit.specify
> Feature: CSV export for the reporting dashboard

Plausible output excerpt:

Creating spec: specs/042-csv-export/spec.md
Goals
─────
• Users can export any filtered report view to a UTF-8 CSV file
• Export includes all visible columns in display order
• Files are named report-{date}-{filter-hash}.csv
Non-goals
─────────
• No Excel (.xlsx) support in this iteration
• No scheduled / email delivery of exports
Acceptance criteria
───────────────────
• Export button visible on all report views with > 0 rows
• Large exports (> 10k rows) stream; do not buffer in memory
• Column headers match the display labels, not the DB column names

Commit this file. It is now the source of truth. If the requirements change, the spec changes — not the chat.


Phase 3 — Resolve ambiguities: /speckit.clarify

What it does: Reads the spec and surfaces underspecified areas as explicit questions. Answers are written back into the spec as a Clarifications section.

Artifact: Clarifications section appended to specs/{number}-{name}/spec.md

/speckit.clarify

For the CSV export, it might surface: “The spec says ‘all visible columns’ — does this include columns the user has hidden via column picker? Does it include computed columns not stored in the DB?” Running this before the technical plan prevents ambiguities from becoming defects.


Phase 4 — Technical plan: /speckit.plan

What it does: Reads the constitution and spec, then produces a technical implementation plan — data model changes, API shape, component breakdown, dependency decisions.

Artifact: specs/{number}-{name}/plan.md (and optionally data-model.md, api.md, etc.)

/speckit.plan

Plausible output excerpt:

Reading: .specify/memory/constitution.md
Reading: specs/042-csv-export/spec.md
Plan: specs/042-csv-export/plan.md
API layer
─────────
GET /api/reports/:id/export?format=csv
• Streams response using Node's Transform pipeline
• Honors existing filter/sort query params
• Sets Content-Disposition: attachment; filename="..."
Frontend
────────
• ExportButton component — single responsibility, no state
• Triggers fetch with current filter params from ReportContext
• Disables during in-flight request; re-enables on completion

Review this plan before proceeding. This is the last cheap moment to change the approach.


Phase 5 — Task breakdown: /speckit.tasks

What it does: Decomposes the plan into an ordered, atomic task list — discrete units of work the agent (or a human) can execute and check off.

Artifact: specs/{number}-{name}/tasks.md

/speckit.tasks

Tasks are granular enough to be individually committed and independently verifiable. Example tasks: “Add GET /api/reports/:id/export route skeleton with auth middleware,” “Implement streaming CSV transform,” “Add ExportButton component,” “Write integration test: export 50k-row fixture, assert streaming.”


Phase 6 — Cross-check: /speckit.analyze

What it does: Reads the spec, plan, and tasks together and checks for consistency gaps — a task that has no corresponding spec requirement, a spec requirement with no covering task, or a plan decision that contradicts the constitution.

Artifact: Analysis output (inline; flagged items you address before implementing)

/speckit.analyze

Running this before implementation is inexpensive insurance. A five-minute analysis that surfaces “the spec requires streaming for > 10k rows but no task covers the streaming transform” is worth considerably more than discovering the gap in code review.


Phase 7 — Execute: /speckit.implement

What it does: Works through tasks.md in order, implementing each task against the plan. Because the agent reads the spec and plan at the start of each task, it has a stable target — it is not reconstructing requirements from conversation memory.

Artifact: Working implementation

/speckit.implement

If you are running a multi-session project, this is where the investment pays back: open a new session, run /speckit.implement, point the agent at @specs/042-csv-export/, and it re-reads the spec and picks up from the next unchecked task. No context ramp-up. No re-explaining.

💡Two more verified commands

The README also confirms two additional commands not in the core flow above:

  • /speckit.checklist — generates a custom quality checklist for the feature (useful for pre-PR review gates)
  • /speckit.taskstoissues — converts tasks.md into GitHub Issues, one issue per task

Use them when your workflow benefits. Neither is required for the core spec-driven loop.

Living with specs

The spec is not a planning document you throw away after kick-off. It is the durable source of truth for the life of the feature.

Commit it early. The spec, plan, and tasks belong in version control alongside the code. A PR that includes specs/042-csv-export/ before implementation starts gives reviewers a chance to push back on requirements and approach — not behavior.

The agent re-reads the spec each session. At the start of any new session working on this feature, reference the spec explicitly:

@specs/042-csv-export/spec.md @specs/042-csv-export/plan.md
Continue from the next unchecked task in tasks.md.

The agent now has the same baseline it had at the start of the previous session. This is what Lesson 2 calls persistent, agent-legible working memory — a file the agent re-reads every session beats re-explaining in every prompt.

When reality diverges, update the spec first. If a mid-implementation discovery changes the approach — the streaming library you planned on has a breaking API, the backend team changed the filter contract — update spec.md and plan.md before changing the code. The spec is the source of truth, not the chat scrollback. A spec that drifts from the code is a spec that stops being useful.

Use spec drift as a code review signal. If a PR changes behavior that the spec does not mention, that is either a spec that needs updating or a change that needs justification. Either way, the spec makes the gap visible.

When NOT to spec

Spec-driven development is an investment. The ceremony earns its cost when the feature is large enough that the discipline compounds across sessions and team members. It does not pay for itself on:

  • Single-session fixes. “Change the button color to #3b82f6” — run it, ship it, done. No spec needed.
  • Throwaway prototypes. If the goal is to discover whether something is worth building, a spec is premature. Prototype first, spec the keeper.
  • Isolated, well-understood changes. “Add a createdAt field to the User model” — if the whole change fits in one commit with no ambiguity, the spec costs more than it returns.

The decision heuristic: if “what are we building” will still be a live question in 48 hours, write a spec.

The spec-kit-llm-council extension

Built by this course's author — optional extension

spec-kit-llm-council (MIT; v0.3.x) is an extension to Spec Kit built by the author of this course. It is early-stage and optional — core Spec Kit works without it.

What it adds: Three lifecycle gates that pause the workflow and convene a multi-model council — a panel of LLMs with different training data, different priors, and different failure modes — to review your spec, plan, and tasks before coding starts. The premise: one model misses things that a panel catches.

The three gates run after /speckit.specify, before /speckit.tasks, and before /speckit.implement. Verdicts are advisory only — nothing blocks the next Spec Kit command. The results are written to .specify/council/{feature}/{gate}-review.md as a durable audit trail.

Install:

Terminal window
uv tool install llm-council
specify extension add llm-council

Dry-run before spending tokens:

speckit.llm-council.dry-run

Example output:

[council] dry-run for 042-csv-export — gate: plan (mode: plan)
[council] participants: claude, codex, gemini, deepseek_v4_pro
[council] estimated cost: $0.0017 (cap: $0.50) ✓ under cap

Recall the last verdict:

speckit.llm-council.last
[council] last verdict for 042-csv-export — yes (4/4)
[council] Recorded: 2026-05-04T05:04:34Z

The council pattern — routing a decision through multiple models and synthesizing their verdicts — is covered in depth in the next lesson: Lesson 8: The Council Pattern.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom
The agent ignores the spec and implements something different
Evidence
Running /speckit.implement but the output diverges from spec.md requirements
What to ask the AI
"Stop. Read specs/042-csv-export/spec.md in full. List every acceptance criterion. Then explain which of these your last change satisfies and which it violates before writing any more code."
Symptom
The spec and the plan have drifted after mid-implementation changes
Evidence
The implementation no longer matches the streaming approach described in plan.md
What to ask the AI
"Read specs/042-csv-export/plan.md and compare it to the current implementation in src/api/reports/export.ts. List every discrepancy. Then tell me: should the plan be updated to match what we built, or should the code be updated to match the plan? Do not change either until I decide."
Symptom
/speckit.analyze finds uncovered requirements
Evidence
Analysis output shows 'spec requires streaming for >10k rows — no covering task found'
What to ask the AI
"Add a task to tasks.md: 'Implement streaming CSV transform for exports exceeding 10,000 rows using Node Transform pipeline.' Insert it between the route skeleton task and the ExportButton task. Then re-run /speckit.analyze to confirm coverage."
KNOWLEDGE CHECK

A teammate asks you to add a small search filter to an existing page — it is a two-hour task you have done variations of before. Should you write a spec?

KNOWLEDGE CHECK

Which Spec Kit phase is specifically designed to surface underspecified requirements before any technical planning begins?

KNOWLEDGE CHECK

A new session opens on day three of implementing a multi-session feature. The agent has no memory of the previous two days. Where does the authoritative definition of what you are building live?

Key takeaways

  • Specs beat conversations for multi-session work. Context windows expire; committed files do not. The spec is the agent-legible source of truth that outlives any single session.
  • The Spec Kit workflow: constitution (project principles) → specify (what/why) → clarify (resolve ambiguities) → plan (technical approach) → tasks (atomic work units) → analyze (cross-check coverage) → implement (execute against stable target).
  • Commit the artifacts. spec.md, plan.md, and tasks.md belong in version control. A PR that includes the spec before the code gives reviewers the cheapest possible opportunity to push back.
  • When reality diverges, update the spec first. The spec is the source of truth, not the chat. Drift between spec and code is a signal, not a feature.
  • Skip the ceremony for small work. If the task fits in one session with no ambiguity, prompt directly. Reserve spec-driven development for features where “what are we building” will still be a live question in 48 hours.

Next up: Lesson 8: The Council Pattern — routing decisions through a panel of models and synthesizing verdicts before committing to a path.

Search lessons