Mastery Module 19 · Agentic Engineering

Runbooks: Docs Agents Execute

Last reviewed

Advanced

What you'll learn

~25 min
  • Structure a runbook an agent can execute without asking questions
  • Execute an existing runbook with a single prompt
  • Capstone: author your own runbook and verify it by execution
Read gitflow-runbook.md and execute it.

That is the entire prompt. A well-built runbook takes it from there: the agent reads the file, discovers your environment, branches the configuration for your OS and shell, executes the actual setup work, runs health checks against expected outputs, and self-heals from predictable failures — all without asking a single follow-up question.

By the end of this lesson you will have written one of those files and proven it works by handing it to a cold agent session and watching it finish clean.

What a runbook is

A runbook is documentation written for an agent to execute, not a human to read.

That distinction matters more than it sounds. Human docs are written to inform — they explain intent, add context, hedge edge cases in prose, and trust the reader to adapt. Agent docs are written to drive — they are structured so a model can parse the current phase, execute each step, branch on detected conditions, and verify its own progress without ever needing to ask you what to do next.

The Giving Good Instructions lesson taught you to write prompts that increase the odds of a strong first draft. A runbook is the ceiling of that skill: instead of a prompt you type once, it is a reusable artifact any agent can execute any number of times.

This pattern is formalizing across the industry. AWS open-sourced Strands Agent SOPs — markdown workflow documents that use RFC 2119 keywords (MUST, SHOULD, MAY) with parameterized templates, covering code review, incident response, and documentation workflows. Amazon reports thousands of these documents in internal use. The signal is clear: well-structured agent instructions are becoming a discipline, not a trick.

Runbooks vs. skills

A skill (see Skills, Hooks & Custom Commands) extends your tool — it teaches Claude Code or Codex CLI a new slash command. A runbook is portable: it is a plain markdown file any agent can read and execute. No installation, no tool-specific configuration. You can hand a runbook to Claude Code, Codex, Copilot CLI, or Antigravity and get the same result.

The five-phase structure

Every reliable runbook follows the same five phases. The phases are ordered by information dependency — each phase uses what the previous one learned.

Phase 1: Environment discovery

The agent runs diagnostic commands to learn what it is working with before it touches anything. This phase produces facts that every subsequent phase depends on.

## Phase 1: Environment Discovery
The agent MUST run each command below and store the result
before proceeding.
- `node --version` # Expect v20 or higher
- `git --version` # Expect 2.x
- `psql --version` # If not found, set HAS_POSTGRES=false
- `echo $SHELL` # Determines shell config file in Phase 2
- `uname -s` # Darwin | Linux | MINGW (Windows/Git Bash)

Phase 2: Platform-specific configuration

Now the agent branches. It uses the facts from Phase 1 to choose the right config path for the detected environment.

## Phase 2: Configuration
**If** uname is Darwin and SHELL contains "zsh":
- Shell config: `~/.zshrc`
- Package manager: homebrew preferred, fallback to npm
**If** uname is Linux or MINGW:
- Shell config: `~/.bashrc`
- Package manager: apt / npm
The agent MUST append to the detected shell config, not replace it.

Phase 3: Automation

The actual work, with exact commands. No ambiguity — the agent should not have to infer intent.

## Phase 3: Database Setup
Run in order. Stop and report if any command exits non-zero.
```bash
createdb dev_app
psql dev_app -c "CREATE EXTENSION IF NOT EXISTS pgcrypto;"
psql dev_app -f ./scripts/schema.sql
psql dev_app -f ./scripts/seed-dev.sql
```
**Expected row counts after seed:**
- users: 10 rows
- products: 50 rows
- orders: 200 rows

Phase 4: Health verification

The agent runs verification commands and checks actual output against expected output. This is what lets the runbook self-certify: the agent is not done until the checks pass.

## Phase 4: Verification
Run each check. Compare actual output to expected output.
Report any mismatch before proceeding.
| Check | Command | Expected Output |
|-------|---------|-----------------|
| DB reachable | `psql dev_app -c "\conninfo"` | `You are connected to database "dev_app"` |
| Row counts | `psql dev_app -c "SELECT COUNT(*) FROM users;"` | `10` |
| Extension | `psql dev_app -c "SELECT pgcrypto_version();"` | Any version string |
| App connects | `npm run db:ping` | `Connected to dev_app` |

Phase 5: Troubleshooting table

This phase is what separates a runbook from a script. A script fails silently or throws a raw error. A runbook maps known failure modes to their causes and fixes — so a blocked agent can self-diagnose and continue rather than stalling and asking you what to do.

## Phase 5: Troubleshooting
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `createdb: error: connection refused` | PostgreSQL not running | Run `brew services start postgresql@16` (macOS) or `sudo systemctl start postgresql` (Linux), then retry Phase 3 |
| `psql: FATAL: role "username" does not exist` | DB user not provisioned | Run `createuser -s $(whoami)` then retry |
| Seed row counts are wrong | Partial seed from a previous failed run | Run `psql dev_app -f ./scripts/drop-dev.sql` then re-run Phase 3 |

Try one now

The agent-runbooks repository — maintained by this course’s author, MIT licensed — is a collection of runbooks following this five-phase structure. It is a worked-example library, not a community standard (it is new), but it is actively maintained and the runbooks are the same structure you just learned.

As of this writing it contains 11 runbooks, including:

  • gitflow-runbook.md — feature branch and PR management using the gh CLI, from discovery (checking git and gh versions) through automating branch conventions, to verifying a clean PR creation
  • chrome-devtools-mcp — configuring Chrome or Edge for remote debugging via CDP protocol, with health checks that verify the WebSocket endpoint is live before handing control back

To run any of them, clone the repo and hand the file to your agent:

Terminal window
git clone https://github.com/Intellimetrics/agent-runbooks.git
# Then in your AI CLI session:
Read agent-runbooks/gitflow-runbook.md and execute it.

That is the entire invocation. Zero install, no configuration before the session starts — the runbook handles its own setup.

What to watch as it runs:

  • Phase 1 should produce a quick burst of diagnostic commands — git --version, gh --version, uname -s. If the agent skips this and goes straight to configuration, the runbook has a gap in Phase 1.
  • Phase 2 should branch silently based on what Phase 1 found. You should not be asked “are you on macOS or Linux?”
  • Phase 4 is where you learn whether the runbook is actually correct — watch that the agent checks outputs, not just runs commands.
  • If the agent asks you a question mid-execution, that question is a gap in the runbook. Note it.
Vet a runbook before you hand it over

Runbooks instruct agents to install packages, write config files, and run database commands. Apply the same vetting instinct you use for MCP servers (see MCP Servers): read a runbook before handing it to an agent with write access. A runbook from a trusted source (your own repo, your team’s repo) is low-risk. A runbook from an unfamiliar source deserves the same scrutiny as a shell script someone sent you in an email.

Capstone exercise: author your own runbook and grade it by execution

This is the point of the lesson. Everything above was setup.

Pick a task you do repeatedly. Good candidates:

  • Bootstrapping a dev environment on a new machine (nvm, Node, package install, .env setup)
  • Seeding test data before a demo or review session
  • A deploy preflight checklist (lint, test, env var audit, version bump)
  • Onboarding a new team member’s local setup

Write the runbook in a file called my-runbook.md. Follow all five phases. For each phase, ask yourself: if the agent only had this file and nothing else, would it know what to do here? If you would have to explain anything verbally, write it down instead.

Grade it by execution. Open a fresh agent session — or better yet, use a second tool. If you wrote the runbook while using Claude Code, hand it to Codex CLI or Copilot CLI:

Terminal window
# In a fresh session with no prior context:
Read my-runbook.md and execute it.

Then count the questions.

Every question the agent asks is a bug in your documentation. Not a bug in the agent — the agent is doing its job. A question means the runbook reached a decision point it was not equipped to resolve. Go back, add the missing branch or clarification, and run it again with a fresh session.

The runbook is done when a cold agent completes it without asking anything.

🔍How this capstone composes the module

Notice what you just built and why it works:

  • The runbook is context engineering (Lesson 2) made permanent — instead of loading context at the start of each session with a CLAUDE.md or memory file, you encoded it as an executable artifact.
  • It may invoke MCP servers (Lesson 3) — a postgres-mcp runbook or chrome-devtools-mcp runbook calls out to external tools the agent wouldn’t have by default.
  • It extends what skills (Lesson 4) do, but at the task level rather than the tool level — a skill teaches your tool a new command; a runbook teaches any agent a new workflow.
  • It can be handed to a delegated agent (Lesson 5) — an orchestrator agent that bootstraps environments can call subagents using runbooks as their instructions.

A runbook is where Giving Good Instructions stops being a conversational skill and becomes infrastructure.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom
Agent stalls mid-runbook and asks a question
Evidence
The agent pauses and asks something like 'Should I use your local PostgreSQL or a Docker container?'
What to ask the AI
"The runbook has a gap. Add a branch to Phase 2 that covers both cases: 'If Docker is available (check: docker --version), use the container path. Otherwise use the local PostgreSQL path.' Then re-run from a fresh session. The agent should not reach a decision point the runbook didn't anticipate."
Symptom
The runbook worked last month but fails now
Evidence
Phase 3 commands error out; tool versions or paths have changed
What to ask the AI
"Runbook drift — the environment changed but the runbook didn't. Update Phase 1 to capture the current versions, then trace forward to find where Phase 3 assumptions no longer hold. Version-pin the commands where possible (e.g., specify the exact psql or npm command path) and add the new failure mode to Phase 5's troubleshooting table."
Symptom
Agent ran all commands but skipped Phase 4 verification
Evidence
The agent reports 'done' without running the health check queries
What to ask the AI
"Phase 4 is being treated as optional. Rewrite the Phase 4 header with an explicit directive: 'The agent MUST run every verification command in this table before reporting completion. Do not skip this phase even if Phase 3 appeared to succeed.' RFC 2119 keywords (MUST, SHALL) signal non-negotiable steps to the model."

Key takeaways

  • A runbook is the ceiling of giving good instructions. It transforms a prompt into a durable, portable, reusable artifact — documentation that executes.
  • The five-phase structure works because the phases are ordered by dependency. Discovery before configuration, configuration before automation, automation before verification, verification before troubleshooting.
  • Verification by execution is the genius of the format. The runbook is correct when a cold agent finishes it without asking questions. Every question is a gap to fix.
  • Runbooks are portable across tools. Write once, hand to any agent — Claude Code, Codex, Copilot CLI, Antigravity. The format is plain markdown.
  • Vet before you hand over. A runbook has the same permissions footprint as a shell script. Read it before you trust it.

Where to go from here: the habit that makes all of this compound over time is staying current with how the tools evolve — new runbook patterns, new RFC conventions, new MCP servers worth a runbook of their own. See Staying Current for the habit loop.

Search lessons