Context Engineering

It is Tuesday afternoon. You open a new session and ask the agent to extend the feature you built Monday. It generates something that contradicts the architecture decision you made two days ago — the one you spent 40 minutes deliberating over in last Friday’s planning session.

The agent is not stupid. It never saw that decision. Your context window expired.

That is context rot. And fixing it is not about writing better prompts — it is about engineering the information that reaches the model.

What context engineering actually is

Anthropic defines context engineering as curating the smallest possible set of high-signal tokens for each agent call. Not the most complete record of everything that ever happened. Not a wall of documentation preloaded at session start. The minimum set needed to do the next unit of work correctly.

This matters because every frontier model degrades as the window fills. Chroma’s research confirmed it across models: quality drops measurably as context grows, well before you hit the token ceiling. More tokens do not mean better answers — past a certain point they mean worse ones.

The practical corollary: a focused 30K-token session will outperform a bloated 200K one on the same task.

The Vibe Coding Reference already covers the 60-70% effective-context rule and the /clear vs /compact decision table — bookmark that cheat-sheet and come back to it. This lesson explains why degradation happens and goes deeper on the four levers that prevent it.

The four levers

1. Compaction

When a session accumulates dead weight — failed approaches, long error traces, exploratory tangents — compaction distills the thread into a summary the model can reason from efficiently.

/compact

Run it at natural boundaries: after a feature lands and is committed, before shifting to a different area of the codebase, when you notice the model starting to contradict itself. The goal is to compact after work is safe in git — compaction is lossy by design, and the commits are your real record.

Claude Code’s auto-compact triggers near the window limit. Getting ahead of it manually keeps you in control of what gets preserved.

2. Just-in-time retrieval

Preloading every file that might be relevant at session start is a tax you pay in tokens up front, on things the agent may never use.

Instead, reference files only when the agent actually needs them:

@src/services/billing.ts -- add a refund method following the existing patterns in this file

The @-reference pulls the file into context precisely when the agent needs it, then it can be compacted or cleared on the next boundary. This is especially valuable in large repos: let the agent navigate to what it needs rather than flooding it with your entire src/ tree.

3. Structured note-taking

For work that spans multiple sessions, the agent’s in-memory state is useless the moment you close the terminal. What persists is what you wrote down.

Before you compact or end a session, tell the agent to record what it learned:

Before we compact: summarize the three decisions we made today about
the data model, and append them to CLAUDE.md under ## Session Notes.

This gives you a persistent, human-readable audit trail that the agent re-reads on the next open. Plan files work the same way — a spec the agent references at the start of each session beats re-explaining your architecture from scratch every morning. (More on spec-driven workflows in Lesson 7: Spec-Driven Development.)

Claude Code also maintains a per-project auto-memory (a MEMORY.md index plus topic files) that persists across sessions. Run /memory to inspect and edit it.

4. Isolation via subagents

Sometimes the best way to protect your main context window is to not use it at all for a subtask.

Delegate exploratory research, dependency audits, or large-file analysis to a subagent. Its dead-ends, verbose output, and half-formed conclusions stay in its own window — your main session only sees the distilled result. This is covered in depth in Lesson 5: Subagents and Orchestration.

Memory files that actually work

The four levers above assume the agent has a baseline to work from. That baseline is your memory file. Here is what separates one that constrains behavior from one that evaporates.

The good: constraints the model can act on

# Project: payment-service

## Build & verify
npm run build         # must succeed, zero errors
npm run test:unit     # run after every schema change

## Conventions
- All monetary values in cents (integer), never floats
- API responses: { data, error, meta } envelope — no exceptions
- New endpoints: add to openapi.yaml before implementation
- TypeScript strict mode; no `any`

## Never touch
- src/legacy/v1-compat.ts — frozen, customer dependency
- db/migrations/ — append only, never edit existing migrations

## Verification step
After any auth change: npm run test:auth && check /health endpoint

Fifteen lines. Every line is actionable. The model can check itself against each one.

The bad: aspirational prose

# Guidelines

This project values clean, readable, maintainable code that follows
best practices and industry standards. We aim for high test coverage
and prefer well-documented solutions. The team uses a collaborative
approach and code quality is important to us. We follow the principle
of least surprise and try to write code that will be easy for future
developers to understand and modify...

The model reads this, finds nothing to constrain a specific decision, and falls back on its priors. Wall-of-prose memory files do not fail loudly — they just silently do nothing.

The test: For each line in your memory file, ask “could the agent make a concrete decision based only on this line?” If the answer is no, cut or rewrite it.

The memory file hierarchy

Tool	Memory file	Scope	How to create/edit
Claude Code	`~/.claude/CLAUDE.md`	User-level — all projects	`/memory`
Claude Code	`./CLAUDE.md`	Project — committed, team-shared	`/init` or manually
Codex CLI	`./AGENTS.md`	Project	Manually
Antigravity CLI	`./AGENTS.md`	Project (takes precedence over `GEMINI.md`)	Manually
GitHub Copilot	`.github/copilot-instructions.md`	Repo-wide	Manually

AGENTS.md is the cross-tool standard. If you work across Claude Code, Codex CLI, and Antigravity, a single AGENTS.md at the repo root travels with the codebase regardless of which tool your teammates use. Layer tool-specific overrides (like Claude’s ./CLAUDE.md) on top for anything that only applies to one tool.

ℹUser-level vs. project-level

~/.claude/CLAUDE.md holds preferences that apply everywhere — your preferred commit style, general conventions, how you like errors explained. The project ./CLAUDE.md holds project-specific constraints — build commands, frozen files, naming rules. Keep them distinct. Mixing project constraints into your user file means they apply everywhere; mixing personal preferences into the project file means they get committed and applied to everyone on the team.

Budgeting a session

Use /context to get a current breakdown of window usage — how many tokens are consumed, what is contributing, and how much headroom remains.

A practical session budget:

Start clean. /clear between unrelated tasks. Yesterday’s e-commerce session is dead weight when you open a new billing task today.
Watch the fill %. When you cross ~50-60%, you are approaching the zone where quality starts to degrade. Do not wait for auto-compact.
Compact at natural boundaries. Finish the feature, commit the work, then /compact. The summary that compaction produces is higher-signal than the raw conversation it replaced.
Clear when you switch domains. Starting a completely different area of the codebase? /clear and let the agent re-read what it needs via @-references and memory files.

The Vibe Coding Reference decision table has the full clear-vs-compact-vs-keep-going matrix — use it as your quick reference.

💡The /btw escape hatch

Claude Code’s /btw command lets you ask a throwaway question that sees current context but does not grow the window. Good for quick clarifications mid-task without polluting the thread.

Notes and artifacts as durable memory

The mental shift: stop thinking of the conversation as your project memory. The conversation is ephemeral. These are durable:

CLAUDE.md / AGENTS.md — behavioral constraints, always present
Plan files (~/.claude/plans/) — the spec the agent re-reads each session
Session notes — decisions appended to memory files before compaction
Git history — the permanent record of what was built and why (commit messages matter)

A practical end-of-session prompt before you compact:

We're about to compact. Before we do:
1. What three decisions did we make today that the next session needs to know?
2. Are there any constraints we discovered that should be added to CLAUDE.md?
Write both to CLAUDE.md under ## Session Notes with today's date, then summarize so I can verify.

This 30-second habit is the difference between a week-long project that stays coherent and one that re-litigates the same architecture decisions every other day.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

The agent contradicts a decision made in a previous session

Evidence

Agent generates code that ignores an architectural constraint you established two days ago

What to ask the AI

"Before continuing: read CLAUDE.md and list every constraint listed there. Then describe how the approach you just generated does or does not comply with the 'API responses must use the { data, error, meta } envelope' rule."

Symptom

Context window is filling fast because of large files

Evidence

/context shows 70%+ used after just a few exchanges, mostly from file reads

What to ask the AI

"Stop. Do not read any more files. Tell me: what is the minimum you need to complete this task? List the specific functions or sections, not the whole files."

Symptom

CLAUDE.md is not affecting agent behavior

Evidence

Agent keeps using float values for money despite the 'always use cents as integers' rule

What to ask the AI

"Read CLAUDE.md aloud. Then explain how the code you just wrote violates the monetary values rule, and correct it."

KNOWLEDGE CHECK

You are 90 minutes into a session building a new auth module. The agent starts generating code that forgets the JWT expiry rules you established at the start. What is the most likely cause?

KNOWLEDGE CHECK

You finish implementing a complex caching layer. The session has been running for two hours. You want to start work on a completely separate rate-limiting module. What is the right sequence?

KNOWLEDGE CHECK

Which of the following belongs in a project CLAUDE.md?

Key takeaways

Context rot is real and structural. Quality degrades as the window fills. The fix is engineering, not better prompts.
Four levers: compaction (distill after committing), just-in-time retrieval (pull files when needed, not all upfront), note-taking (write decisions to memory files before compacting), and subagent isolation (keep exploratory work out of your main window).
Memory files constrain; prose files evaporate. Every line should pass the “can the agent make a specific decision from this alone?” test.
AGENTS.md travels. It is honored by Claude Code, Codex CLI, and Antigravity CLI — one file, three tools.
Session budget: watch /context fill %, compact at natural boundaries after committing, clear when switching tasks.

Next up: Lesson 3: MCP Servers — how the Model Context Protocol extends what the agent can see and do beyond the conversation window.

What you'll learn

What context engineering actually is

The four levers

1. Compaction

2. Just-in-time retrieval

3. Structured note-taking

4. Isolation via subagents

Memory files that actually work

The good: constraints the model can act on

The bad: aspirational prose

The memory file hierarchy

Budgeting a session

Notes and artifacts as durable memory

When Things Go Wrong

Key takeaways