Prompt Hardening for Bioinformatics Tools

💬Need this RIGHT NOW for a clinic? Copy this block.

Append the entire fenced block below to the END of any applied-lesson prompt in this module. That’s the whole patch. Skip the rest of this lesson if you’re in a hurry — you can come back later for the explanation.

ADDITIONAL REQUIREMENTS — apply all of these, no exceptions:

1. CLARIFY BEFORE BUILDING. Before writing any code, list the 3 assumptions you're
   making about input format and expected behavior. Ask me to confirm or correct
   each one. Wait for my reply before generating the tool.

2. EDGE CASES ARE FIRST-CLASS. Explicitly handle these and show me where in the
   code each is handled (use comments): empty input, malformed input, input larger
   than 10 MB, ambiguous IUPAC bases (N, Y, R, K, M, S, W, B, D, H, V), mixed-case
   sequences, Windows vs Unix line endings, multi-line FASTA headers, trailing
   whitespace, BOM characters at file start.

3. SELF-TEST WITH SEEDED DATA. Generate one small test input that exercises every
   code path. Walk through it mentally and show me the expected output BEFORE you
   write the final tool. If your walkthrough doesn't match what the code would do,
   fix the code, not the walkthrough.

4. ACCEPTANCE CRITERIA AS A CHECKLIST. End your response with a checklist of 5–8
   acceptance criteria I can verify in 30 seconds by clicking around the running
   tool. Each criterion must be observable in the UI, not just claimed in code.

5. BOUND THE WORK. Refuse to add features I didn't ask for. If you think a feature
   would help, list it as a "Next steps" item at the bottom — do not implement it.
   No bonus charts, no "I also added…", no scope creep.

6. CALCULATION TRANSPARENCY. For every calculated value (GC content, scores,
   p-values, ratios, percentages), show the formula in a code comment AND show one
   worked example with real numbers in your response text.

7. FAILURE MESSAGING. Every error path must produce a user-visible message that
   names the specific input that caused the error and suggests one specific fix.
   No silent failures. No generic "something went wrong". No console.error only.

8. UX DEFAULTS. Use these defaults unless I override them: dark theme (background
   #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8), monospace font for
   sequences, color-blind-safe palette (Okabe-Ito), keyboard-navigable focus
   states, mobile-readable at 375px width, no horizontal scroll on mobile.

9. NO INVENTED FORMULAS, THRESHOLDS, OR POLICIES. Do not invent scientific
   formulas, decision thresholds, business rules, or institutional policies. If
   a constant, threshold, formula, or policy is not explicitly provided in this
   prompt or in real source material I gave you, either (a) ask me for it before
   generating code, or (b) hard-code it as a clearly named PLACEHOLDER constant
   at the top of the file with a comment "REPLACE BEFORE USE — needs validation
   from [domain expert / SOP / paper]". Never silently pick "industry standard"
   defaults. Never make up p-value cutoffs, GC% targets, fold-change thresholds,
   primer Tm constants, codon tables, billing rates, or facility policies.

That’s it. Paste it after the existing prompt body, send it, and the tool that comes back will be noticeably less “almost right.”

What this lesson is

Day 1 produced a lot of working tools — and a lot of almost right tools. We saw the same pattern across facility groups: the AI built something that ran, opened in the browser, and looked correct at first glance, but missed an edge case, fudged a calculation, or buried an error in the console where nobody saw it.

This lesson is the fix. Not a one-off fix for each lesson — a single 9-clause patch you append to any prompt in this module to close the gap.

We treat this as a meta-lesson because the same 9 clauses apply to every applied lesson on the curriculum (Lessons 1, 2, 4, 7–22). Once you know the pattern, you don’t need to memorize per-lesson workarounds.

ℹWhy this works

The Day 1 lesson prompts were detailed about what to build but quiet about how to build it carefully. The 9 clauses fill in the carefulness: ask before assuming, handle weird inputs, refuse to invent thresholds, show your math, fail loudly. Modern LLMs do all of these things if you ask — but they won’t volunteer them.

💡Adapting for non-bio lessons (billing dashboards, scheduling, intake validators)

Clause 2 lists FASTA-specific edge cases (IUPAC bases, multi-line headers, etc.). When you append the patch to a non-bio lesson — billing dashboard, instrument scheduler, intake validator — replace the bio examples with the format-specific ones for that domain: empty CSV, currency formatting edge cases, time-zone DST gotchas, missing required fields, weird date formats, etc. The other 8 clauses transfer unchanged.

The “almost right” pattern, with examples from Day 1

Here’s what “almost right” looked like in the wild yesterday. Map each failure to the clause that prevents it.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

Tool ran but a key calculation was off (e.g., GC% always slightly low)

Evidence

The numbers looked plausible but didn't match a hand calculation. Hard to spot until a domain expert checked.

What to ask the AI

"Clause 6 (Calculation Transparency) makes the LLM show the formula in comments and a worked example with real numbers. You catch off-by-one and missing-base-class errors immediately."

Symptom

Tool failed silently on weird input — no error, just empty output

Evidence

Pasted a FASTA with Windows line endings or a BOM character; the dashboard rendered nothing, no console error visible.

What to ask the AI

"Clause 7 (Failure Messaging) forces a user-visible error that names the input and suggests a fix. Clause 2 (Edge Cases) makes the LLM handle the BOM and CRLF case specifically."

Symptom

Tool added features nobody asked for, missed features that were specified

Evidence

The prompt asked for ORF table + GC chart. The LLM delivered ORF table + GC chart + amino acid pie chart + restriction enzyme finder, but the GC chart was broken.

What to ask the AI

"Clause 5 (Bound the Work) tells the LLM to refuse scope creep and list extras as 'next steps'. The LLM spends its budget on what you actually asked for."

Symptom

Tool worked on the example data but broke on real lab data

Evidence

Example FASTA loaded fine. Pasting a real GenBank sequence with multi-line headers, lowercase introns, and IUPAC codes broke the parser.

What to ask the AI

"Clauses 2 (Edge Cases) and 3 (Self-Test) cover this. Edge Cases lists the specific things to handle. Self-Test forces the LLM to walk through the code with seeded inputs that exercise the rare paths."

Symptom

Tool looked fine on a laptop, unusable on a phone or tablet at the bench

Evidence

Charts overflowed, table rows wrapped weirdly, focus rings were invisible.

What to ask the AI

"Clause 8 (UX Defaults) bakes in dark theme, mobile width, color-blind-safe palette, and keyboard nav as the assumed baseline. You only override if your facility has a different requirement."

Symptom

LLM made wrong assumptions about the input and built around them

Evidence

Prompt said 'parse FASTA'. LLM assumed single-sequence input. Multi-sequence pastes only showed the first record.

What to ask the AI

"Clause 1 (Clarify Before Building) forces the LLM to surface its assumptions and wait for confirmation. You get to correct 'I assumed single-sequence' before any code is generated."

Symptom

Acceptance criteria felt fuzzy — hard to tell if the tool was actually done

Evidence

Trainee said 'looks right to me' but missed that two columns were swapped.

What to ask the AI

"Clause 4 (Acceptance Criteria) gives you a 5–8 item checklist you walk through in 30 seconds. Concrete pass/fail."

Symptom

Tool quietly invented a scientific threshold or formula

Evidence

Lesson asked for an ORF finder. The LLM picked '100 codons minimum' as the threshold without being told to. Other facilities use 30 codons. Nothing in the code or response flagged that the value was invented.

What to ask the AI

"Clause 9 (No Invented Formulas, Thresholds, or Policies) is the fix. The LLM must either ask you for the threshold OR mark it as a PLACEHOLDER constant with a validation note. This is the highest-leverage clause for scientific correctness — never let an LLM silently pick a 'standard' value that ends up in code your colleagues trust."

The 9 clauses, expanded

Each clause exists because of a real failure mode we saw on Day 1. The exact wording matters less than the intent — but the wording in the copy-pasteable block above is field-tested.

1. Clarify before building

Most “almost right” tools start with the LLM making a quiet assumption — “I’ll assume FASTA is single-sequence”, “I’ll assume input is under 1 MB”, “I’ll assume the user wants the longest ORF only”. Forcing the LLM to list 3 assumptions and wait for confirmation surfaces these before they get baked into 600 lines of code.

2. Edge cases as first-class

Bioinformatics file formats are deceptively casual. FASTA, FASTQ, VCF, BED, GenBank — every one of them has whitespace quirks, line-ending quirks, encoding quirks, and ambiguity codes that the textbook version of the format doesn’t mention. Naming the edge cases in the prompt makes the LLM handle them — naming them generically (“handle edge cases”) doesn’t.

3. Self-test with seeded data

This is the highest-leverage clause. When the LLM walks through its own code with a small test input before finalizing the code, it catches its own logic errors. The walkthrough has to be in the response, not just in the LLM’s head.

4. Acceptance criteria as a checklist

Without a checklist, “done” is whenever the LLM stops typing. With a checklist, “done” is observable. The 30-second budget per criterion is deliberate — if a check takes longer than that, it’s not actually verifiable in a clinic.

5. Bound the work

LLMs love to be helpful by adding extras. In a clinic context, extras eat the model’s attention budget. A focused tool with the requested features beats a sprawling tool with broken features every time.

6. Calculation transparency

For any tool that produces a number, the formula goes in a comment and a worked example goes in the response. This catches: missing units, wrong denominators, off-by-one in window functions, mis-translated codon tables, and any other “the math is almost right” failure.

7. Failure messaging

Silent failures are the worst class of bug because nobody notices them until a real lab decision depends on them. Every error path produces: (a) a user-visible message, (b) the specific input that caused it, (c) one specific suggested fix.

8. UX defaults

The Day 1 prompts specified dark theme inconsistently. This clause bakes in a single set of defaults so every tool in the curriculum looks and behaves the same way. Override only when a facility has a different requirement.

9. No invented formulas, thresholds, or policies

This is the highest-stakes clause for bioinformatics work — and the easiest one to forget you needed until something goes wrong. LLMs are confident about defaults: they will pick a “standard” minimum ORF length, a “standard” p-value cutoff, a “standard” primer Tm formula, or a “standard” billing rate, and bake it into the code without ever flagging that the choice was theirs. Those defaults are sometimes wrong for your facility. This clause forces the LLM to either ask before picking, or to mark the value as a placeholder that a domain expert must validate. If you only add one clause to the kit, this is the one with the highest leverage on scientific correctness.

How to apply during a clinic

Open the lesson page for your clinic’s tool (e.g., Lesson 1: Sequence Analysis Dashboard).
Copy the main prompt from the lesson.
Paste the 9-clause block from the top of this page at the end of that prompt.
Send the combined prompt to your runtime of choice (claude.ai, Claude Code, Copilot CLI, Gemini CLI — see Lesson 00b).
Important: the LLM will respond with its 3 assumptions and ask for confirmation. Read them, correct any wrong ones, then say “go”. Don’t skip this step — it’s where the patch earns its keep.
The LLM produces the tool, the worked example, the acceptance checklist, and the next-steps list.
Walk the acceptance checklist. If anything fails, paste the failure back and ask for a fix. The clarifying-questions clause stays active across iterations.

That’s the entire workflow. Five minutes the first time, two minutes once you’re used to it.

Worked example: hardening Lesson 1 (Sequence Analysis Dashboard)

Here’s what changes when you apply the patch to a real lesson.

Before the patch — the original prompt

The Lesson 1 prompt asks for a sequence-analysis dashboard with FASTA parsing, GC content, ORF finding, base-composition charts, and a dark theme. ~50 lines, very specific about features. Quiet about care.

Day 1 outcome (typical)

Tool opened. Looked right.
Paste a real GenBank FASTA with lowercase introns → GC% reads 0.00%.
Paste a multi-sequence FASTA → only the first sequence shows.
Paste IUPAC codes (Y, R, S, W) → NaN in calculations.
No error messages. Trainee thinks the tool is broken; nobody knows why.

These are exactly the failure modes the existing TroubleshootingBlock in Lesson 1 documents. The Patch Kit prevents them upfront instead of teaching learners to fix them after the fact.

After the patch — what the LLM does differently

It pauses. First thing the LLM says: “Before I write any code, here are 3 assumptions I’m making — please confirm: (a) input is FASTA only, no FASTQ, (b) ORF threshold is 100 codons, (c) reverse complement frames count. Should I change any of these?”
It writes a test plan. Once you confirm, the LLM writes 5 lines of test FASTA: one normal sequence, one lowercase, one with IUPAC codes, one multi-sequence, one with CRLF line endings. It walks through what the parser should produce for each.
It writes the code. Now the parser handles .toUpperCase(), BOM stripping, CRLF, multi-line headers, IUPAC counting as “Other”, and multi-sequence split correctly.
It shows the math. The GC% formula appears in a comment and a worked example: “For ATGCATGC, G+C = 4, total counted bases = 8, GC% = 50.00%”.
It writes failure messages. Empty input → “No FASTA records detected. Make sure your input contains at least one line starting with > followed by sequence data.” Bad input → similarly named.
It ends with a checklist. 6 acceptance criteria you can click through in under 30 seconds each.
It does NOT add restriction enzyme analysis even though that’s a natural extension. It lists it under “Next steps” instead.

That’s the difference. Same prompt body. Eight extra clauses. Visibly different tool.

When to NOT use the patch kit

This is opinionated — read carefully:

Brand new learners on their very first prompt. The patch makes the LLM ask clarifying questions before generating any code. For someone who has never seen a prompt response before, this can be confusing. Run the unpatched Lesson 1 prompt first, see the magic, then learn the patch.
Quick throwaway exploration. If you’re just poking at a sequence to see what an LLM will do, skip the patch. It’s overhead.
Prompts where the lesson explicitly tests a specific failure mode. Some lessons (rare) want learners to experience the failure as a learning moment. Don’t patch over the lesson’s pedagogy.

For everything else — every facility clinic prompt, every customization prompt, every “I’m building this for a colleague” prompt — use the patch.

How to recognize “almost right” in your own work

After you build a tool, run this 60-second check:

Paste weird input. Empty string. A 50 MB file. A FASTA with no sequence lines. CRLF line endings. Lowercase. IUPAC codes. Does it fail loudly or silently?
Hand-check one number. Pick the first calculated value and compute it yourself. Does the tool match?
Open it on your phone. Does the layout work at 375px width?
Tab through the UI. Can you reach every interactive element with the keyboard? Do focus rings show?
Read the next-steps list. Did the LLM volunteer features you didn’t ask for? If so, the bound-the-work clause didn’t take. Re-paste it.

If any of these fails, you have an “almost right” tool. Apply the relevant clause and re-prompt.

Customize: extending the patch kit

The 9 clauses cover the failures we saw on Day 1. Your facility may have failure modes we haven’t seen. Add your own clauses to the block, but follow the pattern:

One clause = one observable failure mode. Don’t write meta-clauses like “be careful.” Write specific clauses like “handle multi-line FASTA headers.”
Name the thing. Generic instructions (“handle edge cases”) don’t move the LLM. Specific instructions (“handle empty input, malformed input, input larger than 10 MB, IUPAC ambiguity codes”) do.
Tell the LLM where to put the proof. “Show me where in the code each is handled” gives you something to verify.

If you add a clause that helps your facility, share it in the post-clinic share-out. We’ll roll the best ones into the next version of this lesson.

Key takeaways

“Almost right” is a prompt-quality problem, not an LLM limitation. Modern models will handle every failure mode in the patch kit if you tell them to.
One patch fixes 17 lessons. The 9 clauses are universal across the biotech curriculum. You don’t need a per-lesson cheat sheet.
The clarifying-question clause is the most important. It prevents wrong assumptions from getting baked into 600 lines of code. Don’t skip it because you’re in a hurry.
Apply the patch by appending, never by rewriting. Lesson prompts stay clean; the patch lives in one place.
Walk the acceptance checklist before declaring done. Concrete pass/fail beats “looks right to me.”

Portfolio suggestion

Save the unpatched-vs-patched comparison from a tool you built today. A short doc — “Here’s the prompt I sent, here’s what came back without the patch, here’s what came back with the patch” — is a strong artifact for a lab meeting or a methods section. It demonstrates that you understand prompt engineering as an engineering discipline, not a vibe.

KNOWLEDGE CHECK

A trainee runs the unpatched Lesson 1 prompt, gets a tool that loads, and pastes a FASTA file containing lowercase intron annotations from SnapGene. The dashboard shows the sequence length correctly but GC content reads 0.00%. Which Patch Kit clause prevents this failure most directly?

What’s next

Pair this with Lesson 00b (Pick Your Runtime). Together they’re the two pages every facility clinic should reference before pasting any prompt.
Apply it to your facility’s clinic lessons. When you sit down for your group’s session, paste the patch onto the first lesson prompt and notice the difference.
Report back at share-out. If you found a failure mode the patch didn’t catch, tell us at the post-clinic share-out. The patch kit is meant to evolve with each cohort.

Prompt Hardening for Bioinformatics Tools

What you'll learn

What this lesson is

The “almost right” pattern, with examples from Day 1

When Things Go Wrong

The 9 clauses, expanded

1. Clarify before building

2. Edge cases as first-class

3. Self-test with seeded data

4. Acceptance criteria as a checklist

5. Bound the work

6. Calculation transparency

7. Failure messaging

8. UX defaults

9. No invented formulas, thresholds, or policies

How to apply during a clinic

Worked example: hardening Lesson 1 (Sequence Analysis Dashboard)

Before the patch — the original prompt

Day 1 outcome (typical)

After the patch — what the LLM does differently

When to NOT use the patch kit

How to recognize “almost right” in your own work

Customize: extending the patch kit

Key takeaways

Portfolio suggestion

What’s next