eDNA Contamination QC Checker
What you'll learn
~25 min- Build a standalone eDNA contamination QC tool with a single AI prompt
- Compare OTU/ASV tables from negative controls against field samples to flag potential contamination
- Troubleshoot common issues with CSV parsing and contamination threshold logic
- Customize the checker with adjustable thresholds, batch processing, and report export
What you’re building
Imagine dropping a CSV of eDNA metabarcoding results into a browser window and instantly seeing which taxa also appear in your negative controls — flagged, scored, and removable with a single click. No R, no Python, no QIIME dependency chain. Just one HTML file you can open on any lab computer.
That is what you will build in the next 20 minutes.
Every eDNA study lives or dies on its negative controls. Reviewers will ask about them. PIs will ask about them. If you cannot demonstrate that your detections are not contamination artifacts, your data is unpublishable. This tool gives you an instant, visual answer to “are my negatives clean?” — and a defensible record of what you flagged and why.
By the end of this lesson you will have a standalone eDNA contamination QC checker that runs entirely in the browser. It parses an OTU/ASV table CSV, auto-detects negative control columns, visualizes overlap between negatives and field samples, scores contamination severity per sample, and lets you toggle flagged taxa on and off to produce a cleaned results table.
This pattern works for any quality control scenario where you compare test results against known baselines. Lab blanks in chemistry, negative controls in PCR, calibration standards in mass spec — same logic, different domain terms.
🔍Domain Primer: Key eDNA terms you'll see in this lesson
New to eDNA metabarcoding? Here are the key terms:
- eDNA (environmental DNA) — DNA shed by organisms into their environment (water, soil, air). You collect a water sample, extract DNA from it, and sequence it to find out what species are present — without ever seeing or catching the organisms.
- OTU (Operational Taxonomic Unit) — A cluster of similar DNA sequences grouped at a similarity threshold (usually 97%). Think of it as a rough proxy for “species” when you cannot assign an exact name.
- ASV (Amplicon Sequence Variant) — A single unique DNA sequence resolved from the data, more precise than OTUs. Modern pipelines (DADA2, Deblur) produce ASVs instead of OTUs.
- Negative control — A sample that should contain zero target DNA. Used to detect contamination introduced during sampling or lab work.
- Field blank — A container of sterile or purified water that is processed through the entire field collection protocol (filtered through the same apparatus, preserved identically) at the sampling site. Detects contamination from field equipment, collection containers, and the sampling environment.
- Extraction blank — A tube of reagents processed through the entire DNA extraction protocol but with no sample added. Detects contamination from extraction chemicals or equipment.
- NTC (No-Template Control) — A PCR reaction with water instead of DNA template. Detects contamination in your PCR reagents or from cross-well splashing.
- Metabarcoding — Sequencing a short, standardized DNA region (like 12S for fish, 16S for bacteria) from an environmental sample to identify all species present at once.
- False positive — A detection of a species that is not actually present at the site. Contamination is the primary source of false positives in eDNA studies.
- Read count — The number of DNA sequence reads assigned to a particular taxon in a particular sample. Higher counts generally indicate more confidence in the detection, but even high counts can be contamination.
You do not need to memorize these — the tool handles the logic. You just need to know that “taxa in negatives = potential contamination.”
Who this is for
- eDNA field researchers who need a fast QC check before submitting data for publication.
- Core facility staff processing eDNA samples for multiple PIs who need a standardized contamination screening step.
- Graduate students learning eDNA methods who want to understand what negative control comparison actually looks like in practice.
eDNA processing facilities (like the UW-Madison Aquatic eDNA Lab) handle dozens of projects simultaneously. Cross-project contamination is a real risk. A browser-based QC checker that any lab member can run — without installing R packages or configuring a bioinformatics pipeline — means contamination checks actually get done instead of being deferred to “later.”
The showcase
Here is what the finished tool looks like once you open the HTML file in a browser:
- Header with a file upload area for your OTU/ASV table CSV (or a textarea for pasting).
- Auto-detection panel showing which columns were identified as negative controls, with checkboxes to override.
- Overlap visualization showing a bar chart of taxa shared between negatives and field samples (Chart.js).
- Flag list — a table of potentially contaminating taxa with read counts in each negative, read counts in each field sample, and a contamination severity score.
- Per-sample contamination score — a summary showing how “contaminated” each field sample appears, based on the proportion of its reads attributable to flagged taxa.
- Cleaned results table — the original OTU/ASV table with toggle switches to include or exclude each flagged taxon, updating totals in real time.
Everything runs client-side. Your eDNA data never leaves your browser.
The prompt
Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu.
Full lesson →
, navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project".
Full lesson →
, start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands.
Full lesson →
(e.g., by typing claude), and paste this prompt:
Build a single self-contained HTML file called edna-contamination-qc.html thatserves as an eDNA contamination QC checker. Requirements:
1. DATA INPUT - File upload button accepting .csv files plus a textarea for pasting CSV data - CSV format: first column is taxon name, remaining columns are sample names with read counts as cell values - Auto-detect negative control columns by matching column names against these patterns (case-insensitive): "blank", "NTC", "negative", "control", "extraction_blank", "field_blank" - Show detected negatives as a checklist so the user can override selections - Include a "Load Example" button with this embedded dataset:
Taxon,Site_1,Site_2,Site_3,Site_4,Site_5,Site_6,Extraction_Blank_1,Extraction_Blank_2,Field_Blank,NTC Salvelinus_fontinalis,1842,0,2105,967,0,1533,0,0,0,0 Micropterus_salmoides,0,3201,1876,0,2987,0,0,0,0,0 Oncorhynchus_mykiss,2456,1102,0,3044,0,1897,0,0,0,0 Lithobates_catesbeianus,0,0,1543,0,2211,876,0,0,0,0 Chelydra_serpentina,654,0,0,1201,0,0,0,0,0,0 Esox_lucius,0,1876,0,0,1432,2301,0,0,0,0 Homo_sapiens,12,34,8,21,15,27,187,203,45,0 Salmo_trutta,1654,0,2876,0,1098,0,0,0,0,0 Ambloplites_rupestris,0,987,0,0,543,1201,0,0,0,0 Cyprinus_carpio,0,0,1432,876,0,0,0,0,0,0 Notemigonus_crysoleucas,765,0,0,0,1234,0,0,0,0,0 Gallus_gallus,0,0,3,0,2,0,42,38,0,12 Bos_taurus,0,5,0,7,0,3,0,0,0,8 Ictalurus_punctatus,0,0,0,1543,0,876,0,0,0,0 Notropis_hudsonius,1234,0,876,0,0,543,0,0,0,0 Catostomus_commersonii,0,1543,0,0,987,0,0,0,0,0 Perca_flavescens,876,0,1234,654,0,0,0,0,0,0 Sus_scrofa,0,0,0,0,0,0,5,0,3,0 Ameiurus_natalis,0,654,0,0,1098,0,0,0,0,0 Lepomis_macrochirus,1432,0,876,0,0,1654,0,0,0,0
2. CONTAMINATION ANALYSIS - Compare each taxon: if it has reads > 0 in ANY negative control column, flag it as a potential contaminant - For each flagged taxon show: taxon name, total reads in negatives, total reads in field samples, max single-negative read count, ratio of negative reads to total reads (as a percentage) - Severity score per taxon: "Critical" if negative reads > 50% of total, "Warning" if 5-50%, "Low" if < 5% - Per-sample contamination score: for each field sample, calculate what percentage of its total reads come from flagged taxa
3. VISUALIZATIONS (Chart.js from CDN) - Horizontal bar chart showing each flagged taxon with two bars: total reads in negatives (red) vs total reads in field samples (blue) - Bar chart showing per-sample contamination percentage for each field sample - Summary stats at top: total taxa count, flagged taxa count, clean taxa count
4. CLEANED RESULTS TABLE - Show the full OTU table with toggle switches next to each flagged taxon - Toggling a taxon OFF removes its row and recalculates all sample totals - Add an "Export Clean CSV" button that downloads the table with deselected taxa removed - Add a "Select All Flagged" / "Deselect All Flagged" button pair
5. DESIGN - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8 - Critical severity rows highlighted in red (#7f1d1d), Warning in amber (#78350f), Low in gray (#374151) - Clean sans-serif font (Inter from Google Fonts CDN) - Responsive single-column layout - Include Clear button to reset everything
6. TECHNICAL - Pure HTML/CSS/JS in one file, no build step - Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js) - CSV parsing handles quoted fields and commas within quotes - All processing client-side, no data uploaded anywhereThat entire block is the prompt. Paste it as-is. The embedded sample data is deliberately constructed: Homo sapiens, Gallus gallus, and Bos taurus appear in both negatives and field samples (simulating common lab contamination sources), while Sus scrofa appears only in negatives. The 16 remaining taxa are clean freshwater species.
What you get
After the LLM finishes (typically 60-90 seconds), you will have a single file: edna-contamination-qc.html. Open it in any browser.
Expected output structure
edna-contamination-qc.html (~500-700 lines)Click Load Example and you should see:
- Four columns auto-detected as negatives: Extraction_Blank_1, Extraction_Blank_2, Field_Blank, NTC.
- Four taxa flagged: Homo sapiens (Critical — most reads are in negatives), Gallus gallus (Critical), Sus scrofa (Critical — only appears in negatives), Bos taurus (Warning — low field reads, moderate negative reads).
- The overlap bar chart showing red bars (negative reads) dominating for Homo sapiens and Gallus gallus, confirming these are likely contamination.
- Per-sample contamination scores all below 2% for most sites — because the contaminating taxa have low read counts in field samples relative to the real species.
- The cleaned results table with toggle switches. Turning off all four flagged taxa should leave 16 clean species rows.
If something is off
LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:
| Problem | Follow-up prompt |
|---|---|
| Negative columns not auto-detected | The auto-detect is not finding my negative control columns. The column names are "Extraction_Blank_1" and "NTC". Make the pattern matching case-insensitive and check for partial matches using includes() instead of exact matches. |
| Severity score always shows “Low” | The severity calculation is wrong. It looks like you're comparing read counts instead of percentages. Divide negative reads by total reads (negative + field) and use that ratio for the thresholds. |
| CSV export missing headers | The exported CSV file has data rows but no header row. Add the column headers as the first row of the CSV output. |
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works (the 2-minute explanation)
You do not need to understand every line of the generated code, but here is the mental model:
- CSV parsing splits each line by commas (respecting quoted fields), uses the first row as headers, and treats the first column as taxon names. Every other column is a sample.
- Negative detection checks each column header against a list of keywords (blank, NTC, negative, control). Matching columns are classified as negatives; everything else is a field sample.
- Flagging is simple: if a taxon has any reads greater than zero in any negative column, it gets flagged. The severity score is the ratio of total negative reads to total reads across all samples.
- The cleaned table uses JavaScript toggle switches. When you turn off a taxon, its row is excluded from the data and all column sums are recalculated. The export function only includes rows that are toggled on.
This tool flags potential contaminants but does not automatically remove them. That is deliberate. A taxon like Homo sapiens in a freshwater fish eDNA study is almost certainly contamination — human DNA from skin cells during sampling. But a taxon that appears with 2 reads in a negative and 5,000 reads in a field sample might be a real detection with a tiny amount of cross-contamination. The tool gives you the data; you make the call. Many eDNA papers report both “raw” and “decontaminated” results, with the criteria for removal described in the methods section.
The established statistical tool for contamination assessment in eDNA and microbiome studies is the decontam R package, which uses frequency-based and prevalence-based methods to identify contaminants. This HTML tool is a rapid visual screening complement to statistical decontamination, not a replacement for packages like decontam. Use this checker for a quick first look at your negatives — then run decontam (or similar) for the statistical analysis you will report in your methods section.
🔍Index hopping (tag-jumping) on Illumina sequencers
On Illumina sequencers, index hopping (also called tag-jumping) can cause 0.1-1% of reads to bleed between samples within a run. This means low-level reads of any taxon may appear in negative controls due to sequencer artifacts, not true contamination. Consider this when evaluating “Low” severity flags — a taxon with 3 reads in a negative and 5,000 in field samples may be an index-hopping artifact rather than genuine contamination. Dual-indexing and post-sequencing index-hop filtering reduce but do not eliminate this issue.
Customize it
The base tool is useful as-is, but here are extensions that make it more powerful:
Add read-count threshold filtering
Add a slider that sets a minimum read count threshold for detections. Any taxonwith fewer reads than the threshold in a field sample gets set to zero for thatsample. Common thresholds are 10, 50, or 100 reads. Show how many detectionsare removed at the current threshold. This is separate from the contaminationflagging -- it handles low-confidence detections.Add occupancy-based filtering
Add an occupancy filter: a taxon must be detected in at least N out of MPCR replicates for a site to count as a true detection. Add inputs for Nand M. For each site, gray out detections that fail the occupancy threshold.This is the standard approach in eDNA studies with replicate PCR.Add a printable QC report
Add a "Generate QC Report" button that opens a new print-friendly windowwith: a summary of which negatives were checked, a list of flagged taxawith severity scores, the per-sample contamination percentages, and theanalyst's decision (included/excluded) for each flagged taxon. Format itfor A4 paper so it can be included as a supplementary figure in apublication.Same pattern as every lesson in this module: start with a working tool, then add features one prompt at a time. The contamination checker becomes more publication-ready with each iteration — and you can stop whenever it meets your needs.
Try it yourself
- Open your CLI tool in an empty folder.
- Paste the main prompt from above.
- Open the generated
edna-contamination-qc.htmlin your browser. - Click Load Example and verify the four flagged taxa.
- Try toggling off Homo sapiens and Gallus gallus, then export the cleaned CSV.
- If you have real eDNA data, paste your own OTU table CSV and see what gets flagged.
Key takeaways
- Negative controls are non-negotiable in eDNA studies — this tool makes the comparison fast, visual, and reproducible.
- Auto-detection of negative control columns by name pattern saves time and reduces manual error, but always verify the detection with the override checkboxes.
- Contamination severity scoring (Critical/Warning/Low) gives you a data-driven basis for deciding which taxa to remove — not just intuition.
- The cleaned results table with toggles lets you make removal decisions interactively and export the result immediately, with a clear record of what was removed and why.
- Single-file HTML tools are ideal for QC steps because they can be attached to a lab notebook entry, shared with collaborators, or archived alongside the raw data.
You run the contamination QC checker and see that Homo sapiens has 420 reads across your negative controls and 120 reads across your six field samples. What severity level should this receive, and why?
A flagged taxon has 3 reads in one extraction blank and 4,500 reads across four field samples. Should you remove it from your results?
What’s next
In the next lesson, you will build a Species Detection Heatmap — an interactive presence/absence visualization that shows which species were detected at which sampling sites. It takes the cleaned output from this contamination checker and turns it into the kind of figure you would include in a publication or present at a lab meeting.