Prompt Hardening for Bioinformatics Tools
What you'll learn
~15 min- Recognize the 'almost right' failure pattern that bit Day 1 prompts
- Append 9 hardening clauses to any applied-lesson prompt to close the gap
- Spot which clause fixes which observed failure mode
Append the entire fenced block below to the END of any applied-lesson prompt in this module. That’s the whole patch. Skip the rest of this lesson if you’re in a hurry — you can come back later for the explanation.
ADDITIONAL REQUIREMENTS — apply all of these, no exceptions:
1. CLARIFY BEFORE BUILDING. Before writing any code, list the 3 assumptions you're making about input format and expected behavior. Ask me to confirm or correct each one. Wait for my reply before generating the tool.
2. EDGE CASES ARE FIRST-CLASS. Explicitly handle these and show me where in the code each is handled (use comments): empty input, malformed input, input larger than 10 MB, ambiguous IUPAC bases (N, Y, R, K, M, S, W, B, D, H, V), mixed-case sequences, Windows vs Unix line endings, multi-line FASTA headers, trailing whitespace, BOM characters at file start.
3. SELF-TEST WITH SEEDED DATA. Generate one small test input that exercises every code path. Walk through it mentally and show me the expected output BEFORE you write the final tool. If your walkthrough doesn't match what the code would do, fix the code, not the walkthrough.
4. ACCEPTANCE CRITERIA AS A CHECKLIST. End your response with a checklist of 5–8 acceptance criteria I can verify in 30 seconds by clicking around the running tool. Each criterion must be observable in the UI, not just claimed in code.
5. BOUND THE WORK. Refuse to add features I didn't ask for. If you think a feature would help, list it as a "Next steps" item at the bottom — do not implement it. No bonus charts, no "I also added…", no scope creep.
6. CALCULATION TRANSPARENCY. For every calculated value (GC content, scores, p-values, ratios, percentages), show the formula in a code comment AND show one worked example with real numbers in your response text.
7. FAILURE MESSAGING. Every error path must produce a user-visible message that names the specific input that caused the error and suggests one specific fix. No silent failures. No generic "something went wrong". No console.error only.
8. UX DEFAULTS. Use these defaults unless I override them: dark theme (background #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8), monospace font for sequences, color-blind-safe palette (Okabe-Ito), keyboard-navigable focus states, mobile-readable at 375px width, no horizontal scroll on mobile.
9. NO INVENTED FORMULAS, THRESHOLDS, OR POLICIES. Do not invent scientific formulas, decision thresholds, business rules, or institutional policies. If a constant, threshold, formula, or policy is not explicitly provided in this prompt or in real source material I gave you, either (a) ask me for it before generating code, or (b) hard-code it as a clearly named PLACEHOLDER constant at the top of the file with a comment "REPLACE BEFORE USE — needs validation from [domain expert / SOP / paper]". Never silently pick "industry standard" defaults. Never make up p-value cutoffs, GC% targets, fold-change thresholds, primer Tm constants, codon tables, billing rates, or facility policies.That’s it. Paste it after the existing prompt body, send it, and the tool that comes back will be noticeably less “almost right.”
What this lesson is
Day 1 produced a lot of working tools — and a lot of almost right tools. We saw the same pattern across facility groups: the AI built something that ran, opened in the browser, and looked correct at first glance, but missed an edge case, fudged a calculation, or buried an error in the console where nobody saw it.
This lesson is the fix. Not a one-off fix for each lesson — a single 9-clause patch you append to any prompt in this module to close the gap.
We treat this as a meta-lesson because the same 9 clauses apply to every applied lesson on the curriculum (Lessons 1, 2, 4, 7–22). Once you know the pattern, you don’t need to memorize per-lesson workarounds.
The Day 1 lesson prompts were detailed about what to build but quiet about how to build it carefully. The 9 clauses fill in the carefulness: ask before assuming, handle weird inputs, refuse to invent thresholds, show your math, fail loudly. Modern LLMs do all of these things if you ask — but they won’t volunteer them.
Clause 2 lists FASTA-specific edge cases (IUPAC bases, multi-line headers, etc.). When you append the patch to a non-bio lesson — billing dashboard, instrument scheduler, intake validator — replace the bio examples with the format-specific ones for that domain: empty CSV, currency formatting edge cases, time-zone DST gotchas, missing required fields, weird date formats, etc. The other 8 clauses transfer unchanged.
The “almost right” pattern, with examples from Day 1
Here’s what “almost right” looked like in the wild yesterday. Map each failure to the clause that prevents it.
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
The 9 clauses, expanded
Each clause exists because of a real failure mode we saw on Day 1. The exact wording matters less than the intent — but the wording in the copy-pasteable block above is field-tested.
1. Clarify before building
Most “almost right” tools start with the LLM making a quiet assumption — “I’ll assume FASTA is single-sequence”, “I’ll assume input is under 1 MB”, “I’ll assume the user wants the longest ORF only”. Forcing the LLM to list 3 assumptions and wait for confirmation surfaces these before they get baked into 600 lines of code.
2. Edge cases as first-class
Bioinformatics file formats are deceptively casual. FASTA, FASTQ, VCF, BED, GenBank — every one of them has whitespace quirks, line-ending quirks, encoding quirks, and ambiguity codes that the textbook version of the format doesn’t mention. Naming the edge cases in the prompt makes the LLM handle them — naming them generically (“handle edge cases”) doesn’t.
3. Self-test with seeded data
This is the highest-leverage clause. When the LLM walks through its own code with a small test input before finalizing the code, it catches its own logic errors. The walkthrough has to be in the response, not just in the LLM’s head.
4. Acceptance criteria as a checklist
Without a checklist, “done” is whenever the LLM stops typing. With a checklist, “done” is observable. The 30-second budget per criterion is deliberate — if a check takes longer than that, it’s not actually verifiable in a clinic.
5. Bound the work
LLMs love to be helpful by adding extras. In a clinic context, extras eat the model’s attention budget. A focused tool with the requested features beats a sprawling tool with broken features every time.
6. Calculation transparency
For any tool that produces a number, the formula goes in a comment and a worked example goes in the response. This catches: missing units, wrong denominators, off-by-one in window functions, mis-translated codon tables, and any other “the math is almost right” failure.
7. Failure messaging
Silent failures are the worst class of bug because nobody notices them until a real lab decision depends on them. Every error path produces: (a) a user-visible message, (b) the specific input that caused it, (c) one specific suggested fix.
8. UX defaults
The Day 1 prompts specified dark theme inconsistently. This clause bakes in a single set of defaults so every tool in the curriculum looks and behaves the same way. Override only when a facility has a different requirement.
9. No invented formulas, thresholds, or policies
This is the highest-stakes clause for bioinformatics work — and the easiest one to forget you needed until something goes wrong. LLMs are confident about defaults: they will pick a “standard” minimum ORF length, a “standard” p-value cutoff, a “standard” primer Tm formula, or a “standard” billing rate, and bake it into the code without ever flagging that the choice was theirs. Those defaults are sometimes wrong for your facility. This clause forces the LLM to either ask before picking, or to mark the value as a placeholder that a domain expert must validate. If you only add one clause to the kit, this is the one with the highest leverage on scientific correctness.
How to apply during a clinic
- Open the lesson page for your clinic’s tool (e.g., Lesson 1: Sequence Analysis Dashboard).
- Copy the main prompt from the lesson.
- Paste the 9-clause block from the top of this page at the end of that prompt.
- Send the combined prompt to your runtime of choice (claude.ai, Claude Code, Copilot CLI, Gemini CLI — see Lesson 00b).
- Important: the LLM will respond with its 3 assumptions and ask for confirmation. Read them, correct any wrong ones, then say “go”. Don’t skip this step — it’s where the patch earns its keep.
- The LLM produces the tool, the worked example, the acceptance checklist, and the next-steps list.
- Walk the acceptance checklist. If anything fails, paste the failure back and ask for a fix. The clarifying-questions clause stays active across iterations.
That’s the entire workflow. Five minutes the first time, two minutes once you’re used to it.
Worked example: hardening Lesson 1 (Sequence Analysis Dashboard)
Here’s what changes when you apply the patch to a real lesson.
Before the patch — the original prompt
The Lesson 1 prompt asks for a sequence-analysis dashboard with FASTA parsing, GC content, ORF finding, base-composition charts, and a dark theme. ~50 lines, very specific about features. Quiet about care.
Day 1 outcome (typical)
- Tool opened. Looked right.
- Paste a real GenBank FASTA with lowercase introns → GC% reads 0.00%.
- Paste a multi-sequence FASTA → only the first sequence shows.
- Paste IUPAC codes (Y, R, S, W) → NaN in calculations.
- No error messages. Trainee thinks the tool is broken; nobody knows why.
These are exactly the failure modes the existing TroubleshootingBlock in Lesson 1 documents. The Patch Kit prevents them upfront instead of teaching learners to fix them after the fact.
After the patch — what the LLM does differently
- It pauses. First thing the LLM says: “Before I write any code, here are 3 assumptions I’m making — please confirm: (a) input is FASTA only, no FASTQ, (b) ORF threshold is 100 codons, (c) reverse complement frames count. Should I change any of these?”
- It writes a test plan. Once you confirm, the LLM writes 5 lines of test FASTA: one normal sequence, one lowercase, one with IUPAC codes, one multi-sequence, one with CRLF line endings. It walks through what the parser should produce for each.
- It writes the code. Now the parser handles
.toUpperCase(), BOM stripping, CRLF, multi-line headers, IUPAC counting as “Other”, and multi-sequence split correctly. - It shows the math. The GC% formula appears in a comment and a worked example: “For ATGCATGC, G+C = 4, total counted bases = 8, GC% = 50.00%”.
- It writes failure messages. Empty input → “No FASTA records detected. Make sure your input contains at least one line starting with
>followed by sequence data.” Bad input → similarly named. - It ends with a checklist. 6 acceptance criteria you can click through in under 30 seconds each.
- It does NOT add restriction enzyme analysis even though that’s a natural extension. It lists it under “Next steps” instead.
That’s the difference. Same prompt body. Eight extra clauses. Visibly different tool.
When to NOT use the patch kit
This is opinionated — read carefully:
- Brand new learners on their very first prompt. The patch makes the LLM ask clarifying questions before generating any code. For someone who has never seen a prompt response before, this can be confusing. Run the unpatched Lesson 1 prompt first, see the magic, then learn the patch.
- Quick throwaway exploration. If you’re just poking at a sequence to see what an LLM will do, skip the patch. It’s overhead.
- Prompts where the lesson explicitly tests a specific failure mode. Some lessons (rare) want learners to experience the failure as a learning moment. Don’t patch over the lesson’s pedagogy.
For everything else — every facility clinic prompt, every customization prompt, every “I’m building this for a colleague” prompt — use the patch.
How to recognize “almost right” in your own work
After you build a tool, run this 60-second check:
- Paste weird input. Empty string. A 50 MB file. A FASTA with no sequence lines. CRLF line endings. Lowercase. IUPAC codes. Does it fail loudly or silently?
- Hand-check one number. Pick the first calculated value and compute it yourself. Does the tool match?
- Open it on your phone. Does the layout work at 375px width?
- Tab through the UI. Can you reach every interactive element with the keyboard? Do focus rings show?
- Read the next-steps list. Did the LLM volunteer features you didn’t ask for? If so, the bound-the-work clause didn’t take. Re-paste it.
If any of these fails, you have an “almost right” tool. Apply the relevant clause and re-prompt.
Customize: extending the patch kit
The 9 clauses cover the failures we saw on Day 1. Your facility may have failure modes we haven’t seen. Add your own clauses to the block, but follow the pattern:
- One clause = one observable failure mode. Don’t write meta-clauses like “be careful.” Write specific clauses like “handle multi-line FASTA headers.”
- Name the thing. Generic instructions (“handle edge cases”) don’t move the LLM. Specific instructions (“handle empty input, malformed input, input larger than 10 MB, IUPAC ambiguity codes”) do.
- Tell the LLM where to put the proof. “Show me where in the code each is handled” gives you something to verify.
If you add a clause that helps your facility, share it in the post-clinic share-out. We’ll roll the best ones into the next version of this lesson.
Key takeaways
- “Almost right” is a prompt-quality problem, not an LLM limitation. Modern models will handle every failure mode in the patch kit if you tell them to.
- One patch fixes 17 lessons. The 9 clauses are universal across the biotech curriculum. You don’t need a per-lesson cheat sheet.
- The clarifying-question clause is the most important. It prevents wrong assumptions from getting baked into 600 lines of code. Don’t skip it because you’re in a hurry.
- Apply the patch by appending, never by rewriting. Lesson prompts stay clean; the patch lives in one place.
- Walk the acceptance checklist before declaring done. Concrete pass/fail beats “looks right to me.”
Portfolio suggestion
Save the unpatched-vs-patched comparison from a tool you built today. A short doc — “Here’s the prompt I sent, here’s what came back without the patch, here’s what came back with the patch” — is a strong artifact for a lab meeting or a methods section. It demonstrates that you understand prompt engineering as an engineering discipline, not a vibe.
A trainee runs the unpatched Lesson 1 prompt, gets a tool that loads, and pastes a FASTA file containing lowercase intron annotations from SnapGene. The dashboard shows the sequence length correctly but GC content reads 0.00%. Which Patch Kit clause prevents this failure most directly?
What’s next
- Pair this with Lesson 00b (Pick Your Runtime). Together they’re the two pages every facility clinic should reference before pasting any prompt.
- Apply it to your facility’s clinic lessons. When you sit down for your group’s session, paste the patch onto the first lesson prompt and notice the difference.
- Report back at share-out. If you found a failure mode the patch didn’t catch, tell us at the post-clinic share-out. The patch kit is meant to evolve with each cohort.