eDNA Contamination QC Checker

What you’re building

Imagine dropping a CSV of eDNA metabarcoding results into a browser window and instantly seeing which taxa also appear in your negative controls — flagged, scored, and removable with a single click. No R, no Python, no QIIME dependency chain. Just one HTML file you can open on any lab computer.

That is what you will build in the next 20 minutes.

💬This solves a real problem

Every eDNA study lives or dies on its negative controls. Reviewers will ask about them. PIs will ask about them. If you cannot demonstrate that your detections are not contamination artifacts, your data is unpublishable. This tool gives you an instant, visual answer to “are my negatives clean?” — and a defensible record of what you flagged and why.

By the end of this lesson you will have a standalone eDNA contamination QC checker that runs entirely in the browser. It parses an OTU/ASV table CSV, auto-detects negative control columns, visualizes overlap between negatives and field samples, scores contamination severity per sample, and lets you toggle flagged taxa on and off to produce a cleaned results table.

ℹSoftware pattern: Upload → flag → filter → export

This pattern works for any quality control scenario where you compare test results against known baselines. Lab blanks in chemistry, negative controls in PCR, calibration standards in mass spec — same logic, different domain terms.

🔍Domain Primer: Key eDNA terms you'll see in this lesson

New to eDNA metabarcoding? Here are the key terms:

eDNA (environmental DNA) — DNA shed by organisms into their environment (water, soil, air). You collect a water sample, extract DNA from it, and sequence it to find out what species are present — without ever seeing or catching the organisms.
OTU (Operational Taxonomic Unit) — A cluster of similar DNA sequences grouped at a similarity threshold (usually 97%). Think of it as a rough proxy for “species” when you cannot assign an exact name.
ASV (Amplicon Sequence Variant) — A single unique DNA sequence resolved from the data, more precise than OTUs. Modern pipelines (DADA2, Deblur) produce ASVs instead of OTUs.
Negative control — A sample that should contain zero target DNA. Used to detect contamination introduced during sampling or lab work.
Field blank — A container of sterile or purified water that is processed through the entire field collection protocol (filtered through the same apparatus, preserved identically) at the sampling site. Detects contamination from field equipment, collection containers, and the sampling environment.
Extraction blank — A tube of reagents processed through the entire DNA extraction protocol but with no sample added. Detects contamination from extraction chemicals or equipment.
NTC (No-Template Control) — A PCR reaction with water instead of DNA template. Detects contamination in your PCR reagents or from cross-well splashing.
Metabarcoding — Sequencing a short, standardized DNA region (like 12S for fish, 16S for bacteria) from an environmental sample to identify all species present at once.
False positive — A detection of a species that is not actually present at the site. Contamination is the primary source of false positives in eDNA studies.
Read count — The number of DNA sequence reads assigned to a particular taxon in a particular sample. Higher counts generally indicate more confidence in the detection, but even high counts can be contamination.

You do not need to memorize these — the tool handles the logic. You just need to know that “taxa in negatives = potential contamination.”

Who this is for

eDNA field researchers who need a fast QC check before submitting data for publication.
Core facility staff processing eDNA samples for multiple PIs who need a standardized contamination screening step.
Graduate students learning eDNA methods who want to understand what negative control comparison actually looks like in practice.

ℹCore Facility Context

eDNA processing facilities (like the UW-Madison Aquatic eDNA Lab) handle dozens of projects simultaneously. Cross-project contamination is a real risk. A browser-based QC checker that any lab member can run — without installing R packages or configuring a bioinformatics pipeline — means contamination checks actually get done instead of being deferred to “later.”

The showcase

Here is what the finished tool looks like once you open the HTML file in a browser:

Header with a file upload area for your OTU/ASV table CSV (or a textarea for pasting).
Auto-detection panel showing which columns were identified as negative controls, with checkboxes to override.
Overlap visualization showing a bar chart of taxa shared between negatives and field samples (Chart.js).
Flag list — a table of potentially contaminating taxa with read counts in each negative, read counts in each field sample, and a contamination severity score.
Per-sample contamination score — a summary showing how “contaminated” each field sample appears, based on the proportion of its reads attributable to flagged taxa.
Cleaned results table — the original OTU/ASV table with toggle switches to include or exclude each flagged taxon, updating totals in real time.

Everything runs client-side. Your eDNA data never leaves your browser.

The prompt

Open your terminal , navigate to a project folder , start your AI CLI tool (e.g., by typing claude), and paste this prompt:

Build a single self-contained HTML file called edna-contamination-qc.html that
serves as an eDNA contamination QC checker. Requirements:

1. DATA INPUT
   - File upload button accepting .csv files plus a textarea for pasting CSV data
   - CSV format: first column is taxon name, remaining columns are sample names
     with read counts as cell values
   - Auto-detect negative control columns by matching column names against these
     patterns (case-insensitive): "blank", "NTC", "negative", "control",
     "extraction_blank", "field_blank"
   - Show detected negatives as a checklist so the user can override selections
   - Include a "Load Example" button with this embedded dataset:

     Taxon,Site_1,Site_2,Site_3,Site_4,Site_5,Site_6,Extraction_Blank_1,Extraction_Blank_2,Field_Blank,NTC
     Salvelinus_fontinalis,1842,0,2105,967,0,1533,0,0,0,0
     Micropterus_salmoides,0,3201,1876,0,2987,0,0,0,0,0
     Oncorhynchus_mykiss,2456,1102,0,3044,0,1897,0,0,0,0
     Lithobates_catesbeianus,0,0,1543,0,2211,876,0,0,0,0
     Chelydra_serpentina,654,0,0,1201,0,0,0,0,0,0
     Esox_lucius,0,1876,0,0,1432,2301,0,0,0,0
     Homo_sapiens,12,34,8,21,15,27,187,203,45,0
     Salmo_trutta,1654,0,2876,0,1098,0,0,0,0,0
     Ambloplites_rupestris,0,987,0,0,543,1201,0,0,0,0
     Cyprinus_carpio,0,0,1432,876,0,0,0,0,0,0
     Notemigonus_crysoleucas,765,0,0,0,1234,0,0,0,0,0
     Gallus_gallus,0,0,3,0,2,0,42,38,0,12
     Bos_taurus,0,5,0,7,0,3,0,0,0,8
     Ictalurus_punctatus,0,0,0,1543,0,876,0,0,0,0
     Notropis_hudsonius,1234,0,876,0,0,543,0,0,0,0
     Catostomus_commersonii,0,1543,0,0,987,0,0,0,0,0
     Perca_flavescens,876,0,1234,654,0,0,0,0,0,0
     Sus_scrofa,0,0,0,0,0,0,5,0,3,0
     Ameiurus_natalis,0,654,0,0,1098,0,0,0,0,0
     Lepomis_macrochirus,1432,0,876,0,0,1654,0,0,0,0

2. CONTAMINATION ANALYSIS
   - Compare each taxon: if it has reads > 0 in ANY negative control column,
     flag it as a potential contaminant
   - For each flagged taxon show: taxon name, total reads in negatives, total
     reads in field samples, max single-negative read count, ratio of negative
     reads to total reads (as a percentage)
   - Severity score per taxon: "Critical" if negative reads > 50% of total,
     "Warning" if 5-50%, "Low" if < 5%
   - Per-sample contamination score: for each field sample, calculate what
     percentage of its total reads come from flagged taxa

3. VISUALIZATIONS (Chart.js from CDN)
   - Horizontal bar chart showing each flagged taxon with two bars: total reads
     in negatives (red) vs total reads in field samples (blue)
   - Bar chart showing per-sample contamination percentage for each field sample
   - Summary stats at top: total taxa count, flagged taxa count, clean taxa count

4. CLEANED RESULTS TABLE
   - Show the full OTU table with toggle switches next to each flagged taxon
   - Toggling a taxon OFF removes its row and recalculates all sample totals
   - Add an "Export Clean CSV" button that downloads the table with deselected
     taxa removed
   - Add a "Select All Flagged" / "Deselect All Flagged" button pair

5. DESIGN
   - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8
   - Critical severity rows highlighted in red (#7f1d1d), Warning in amber
     (#78350f), Low in gray (#374151)
   - Clean sans-serif font (Inter from Google Fonts CDN)
   - Responsive single-column layout
   - Include Clear button to reset everything

6. TECHNICAL
   - Pure HTML/CSS/JS in one file, no build step
   - Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js)
   - CSV parsing handles quoted fields and commas within quotes
   - All processing client-side, no data uploaded anywhere

💡Copy-paste ready

That entire block is the prompt. Paste it as-is. The embedded sample data is deliberately constructed: Homo sapiens, Gallus gallus, and Bos taurus appear in both negatives and field samples (simulating common lab contamination sources), while Sus scrofa appears only in negatives. The 16 remaining taxa are clean freshwater species.

What you get

After the LLM finishes (typically 60-90 seconds), you will have a single file: edna-contamination-qc.html. Open it in any browser.

Expected output structure

edna-contamination-qc.html    (~500-700 lines)

Click Load Example and you should see:

Four columns auto-detected as negatives: Extraction_Blank_1, Extraction_Blank_2, Field_Blank, NTC.
Four taxa flagged: Homo sapiens (Critical — most reads are in negatives), Gallus gallus (Critical), Sus scrofa (Critical — only appears in negatives), Bos taurus (Warning — low field reads, moderate negative reads).
The overlap bar chart showing red bars (negative reads) dominating for Homo sapiens and Gallus gallus, confirming these are likely contamination.
Per-sample contamination scores all below 2% for most sites — because the contaminating taxa have low read counts in field samples relative to the real species.
The cleaned results table with toggle switches. Turning off all four flagged taxa should leave 16 clean species rows.

If something is off

LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:

Problem	Follow-up prompt
Negative columns not auto-detected	`The auto-detect is not finding my negative control columns. The column names are "Extraction_Blank_1" and "NTC". Make the pattern matching case-insensitive and check for partial matches using includes() instead of exact matches.`
Severity score always shows “Low”	`The severity calculation is wrong. It looks like you're comparing read counts instead of percentages. Divide negative reads by total reads (negative + field) and use that ratio for the thresholds.`
CSV export missing headers	`The exported CSV file has data rows but no header row. Add the column headers as the first row of the CSV output.`

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

All taxa are flagged as contaminants even though most have zero reads in negatives

Evidence

The flag list shows 20 out of 20 taxa as flagged. The negative control columns appear to include field sample columns too.

What to ask the AI

"The negative control auto-detection is matching too many columns. It looks like column names containing 'site' are being matched. Can you tighten the pattern to only match columns containing 'blank', 'NTC', 'negative', or 'control' as whole words or after an underscore? Show me the detected columns so I can verify before analysis runs."

Symptom

Chart.js charts are blank white rectangles

Evidence

The page loads and the flag table appears with data, but the bar charts are empty white boxes. Console shows 'Chart is not defined'.

What to ask the AI

"Chart.js is not loading before the chart code runs. Can you wrap all chart creation in a window.onload handler, and add an onerror callback on the Chart.js CDN script tag so I can see if the CDN is blocked by my network?"

Symptom

CSV with commas in taxon names fails to parse

Evidence

My OTU table has taxon names like '"Salvelinus fontinalis (Brook trout, Eastern)"' with commas inside quotes. The parser splits these into multiple columns and everything is misaligned.

What to ask the AI

"The CSV parser is not handling quoted fields correctly. Can you use a proper CSV parsing approach that respects double-quoted fields containing commas? The taxon name column sometimes has commas inside quotes."

Symptom

Toggle switches do not update the sample totals in real time

Evidence

I toggle off Homo_sapiens but the column totals at the bottom of the cleaned table still include Homo_sapiens read counts.

What to ask the AI

"The toggle switches are hiding the row visually but not recalculating the column totals. Can you add an event listener that recalculates and re-renders the total row every time a toggle changes?"

How it works (the 2-minute explanation)

You do not need to understand every line of the generated code, but here is the mental model:

CSV parsing splits each line by commas (respecting quoted fields), uses the first row as headers, and treats the first column as taxon names. Every other column is a sample.
Negative detection checks each column header against a list of keywords (blank, NTC, negative, control). Matching columns are classified as negatives; everything else is a field sample.
Flagging is simple: if a taxon has any reads greater than zero in any negative column, it gets flagged. The severity score is the ratio of total negative reads to total reads across all samples.
The cleaned table uses JavaScript toggle switches. When you turn off a taxon, its row is excluded from the data and all column sums are recalculated. The export function only includes rows that are toggled on.

🔍For Researchers: Why threshold-based removal is a judgment call

This tool flags potential contaminants but does not automatically remove them. That is deliberate. A taxon like Homo sapiens in a freshwater fish eDNA study is almost certainly contamination — human DNA from skin cells during sampling. But a taxon that appears with 2 reads in a negative and 5,000 reads in a field sample might be a real detection with a tiny amount of cross-contamination. The tool gives you the data; you make the call. Many eDNA papers report both “raw” and “decontaminated” results, with the criteria for removal described in the methods section.

ℹWhere this tool fits in your workflow

The established statistical tool for contamination assessment in eDNA and microbiome studies is the decontam R package, which uses frequency-based and prevalence-based methods to identify contaminants. This HTML tool is a rapid visual screening complement to statistical decontamination, not a replacement for packages like decontam. Use this checker for a quick first look at your negatives — then run decontam (or similar) for the statistical analysis you will report in your methods section.

🔍Index hopping (tag-jumping) on Illumina sequencers

On Illumina sequencers, index hopping (also called tag-jumping) can cause 0.1-1% of reads to bleed between samples within a run. This means low-level reads of any taxon may appear in negative controls due to sequencer artifacts, not true contamination. Consider this when evaluating “Low” severity flags — a taxon with 3 reads in a negative and 5,000 in field samples may be an index-hopping artifact rather than genuine contamination. Dual-indexing and post-sequencing index-hop filtering reduce but do not eliminate this issue.

Customize it

The base tool is useful as-is, but here are extensions that make it more powerful:

Add read-count threshold filtering

Add a slider that sets a minimum read count threshold for detections. Any taxon
with fewer reads than the threshold in a field sample gets set to zero for that
sample. Common thresholds are 10, 50, or 100 reads. Show how many detections
are removed at the current threshold. This is separate from the contamination
flagging -- it handles low-confidence detections.

Add occupancy-based filtering

Add an occupancy filter: a taxon must be detected in at least N out of M
PCR replicates for a site to count as a true detection. Add inputs for N
and M. For each site, gray out detections that fail the occupancy threshold.
This is the standard approach in eDNA studies with replicate PCR.

Add a printable QC report

Add a "Generate QC Report" button that opens a new print-friendly window
with: a summary of which negatives were checked, a list of flagged taxa
with severity scores, the per-sample contamination percentages, and the
analyst's decision (included/excluded) for each flagged taxon. Format it
for A4 paper so it can be included as a supplementary figure in a
publication.

ℹThe customization loop

Same pattern as every lesson in this module: start with a working tool, then add features one prompt at a time. The contamination checker becomes more publication-ready with each iteration — and you can stop whenever it meets your needs.

Try it yourself

Open your CLI tool in an empty folder.
Paste the main prompt from above.
Open the generated edna-contamination-qc.html in your browser.
Click Load Example and verify the four flagged taxa.
Try toggling off Homo sapiens and Gallus gallus, then export the cleaned CSV.
If you have real eDNA data, paste your own OTU table CSV and see what gets flagged.

Key takeaways

Negative controls are non-negotiable in eDNA studies — this tool makes the comparison fast, visual, and reproducible.
Auto-detection of negative control columns by name pattern saves time and reduces manual error, but always verify the detection with the override checkboxes.
Contamination severity scoring (Critical/Warning/Low) gives you a data-driven basis for deciding which taxa to remove — not just intuition.
The cleaned results table with toggles lets you make removal decisions interactively and export the result immediately, with a clear record of what was removed and why.
Single-file HTML tools are ideal for QC steps because they can be attached to a lab notebook entry, shared with collaborators, or archived alongside the raw data.

KNOWLEDGE CHECK

You run the contamination QC checker and see that Homo sapiens has 420 reads across your negative controls and 120 reads across your six field samples. What severity level should this receive, and why?

KNOWLEDGE CHECK

A flagged taxon has 3 reads in one extraction blank and 4,500 reads across four field samples. Should you remove it from your results?

What’s next

In the next lesson, you will build a Species Detection Heatmap — an interactive presence/absence visualization that shows which species were detected at which sampling sites. It takes the cleaned output from this contamination checker and turns it into the kind of figure you would include in a publication or present at a lab meeting.

eDNA Contamination QC Checker

What you'll learn

What you’re building

Who this is for

The showcase

The prompt

What you get

Expected output structure

If something is off

When Things Go Wrong

How it works (the 2-minute explanation)

Customize it

Add read-count threshold filtering

Add occupancy-based filtering

Add a printable QC report

Try it yourself

Key takeaways

What’s next