Sequencing Run QC Triage
What you'll learn
~35 min- Build a sequencing run QC triage dashboard with a single AI prompt
- Parse Illumina run metrics CSV and flag lanes that fail quality thresholds using Chart.js
- Troubleshoot common issues with CSV parsing, Chart.js rendering, and threshold logic
- Customize the dashboard with facility-specific thresholds, multi-run comparison, or exportable reports
What you’re building
A sequencing run finishes at 2 AM. By the time the facility opens at 8, someone needs to know whether the data is usable or whether lanes need to be re-run. Right now that means opening the Illumina run folder, pulling up InterOp files or BaseSpace, scrolling through metrics tables, and mentally checking each lane against thresholds. It takes 15-20 minutes per run, and it is easy to miss a borderline lane when you are triaging three runs from the weekend.
You are going to build a tool that does this triage in under one second. Upload the run stats CSV, and the dashboard instantly flags every lane that fails QC thresholds — with color-coded pass/fail badges, Chart.js bar charts for visual comparison, and a downloadable summary you can email to the PI before they even ask.
Core facility staff who manage Illumina sequencers spend the first hour of every Monday triaging weekend runs. A dashboard that reads the CSV and flags failures immediately means you walk in, upload, and know in seconds which PIs need a heads-up and which runs are ready for downstream analysis. That hour becomes five minutes.
The finished tool is a standalone QC triage dashboard that runs entirely in the browser. Drop a run stats CSV onto the page, and it parses every lane, compares metrics against configurable thresholds, flags failures, and renders Chart.js bar charts so you can visually spot outliers. No server, no LIMS integration, no installation — one HTML file on the shared drive.
Upload → parse → compare against thresholds → flag outliers → chart results. This is the same pattern used in environmental monitoring dashboards, manufacturing QC systems, and clinical lab result flagging. The techniques transfer anywhere you need to check numbers against pass/fail criteria.
🔍Domain Primer: Key terms you'll see in this lesson
New to sequencing QC? Here are the terms you’ll encounter:
- Reads PF (Passing Filter) — The number of sequencing reads that pass Illumina’s internal quality filter. Measured in millions (M). Low numbers mean the run underperformed and you may not have enough data for analysis.
- %Q30 — The percentage of bases with a quality score of 30 or higher, meaning less than 1-in-1000 chance of error. Industry standard is above 80% for most applications. Below 80% means noisy data.
- Mean Quality Score — The average Phred quality score across all bases in a lane. Should be above 30 for good runs. Scores below 30 indicate degraded chemistry or loading issues.
- Cluster Density (K/mm²) — How many clusters formed per square millimeter on the flowcell. Too low means under-loading (wasted capacity). Too high means over-loading (clusters overlap, quality drops).
- Error Rate (%) — The percentage of bases that differ from the PhiX control spike-in. Healthy runs show under 1%, and anything above 2% suggests a chemistry or imaging problem.
- Phasing / Prephasing — The percentage of molecules in a cluster that fall behind (phasing) or jump ahead (prephasing) during sequencing cycles. High values indicate chemistry degradation. Typical values are under 0.25%.
- Flowcell — The glass chip where sequencing happens. Different types (e.g., S4, SP, Nano) have different lane counts and expected output ranges.
You don’t need to memorize these — the dashboard handles the threshold logic. Just know that each metric has a “good” range, and values outside that range mean something went wrong.
Who this is for
- Sequencing facility staff who triage completed runs every morning and need to quickly identify lanes that need re-sequencing or troubleshooting.
- Genomics core directors who want a visual overview of run quality trends across instruments and need a report to send to PIs with failing lanes.
- Bioinformaticians who receive data handoffs and want to verify upstream QC before investing compute time in alignment and variant calling.
UW-Madison’s Biotechnology Center DNA Sequencing Facility, the Genome Editing and Carrier Screening labs, and the Bioinformatics Resource Center all generate Illumina run data that needs daily QC review. Each facility may have slightly different thresholds, but the triage workflow is identical: check the numbers, flag the failures, notify the PI.
The showcase
Here is what the finished dashboard looks like once you open the HTML file in a browser:
- Drag-and-drop zone at the top where you drop a run stats CSV file (or click to browse). Visual feedback on dragover.
- Run summary header showing instrument name, flowcell type, run date, and overall pass/fail status.
- Metrics table with every lane as a row, each metric in a column, and color-coded pass/fail badges on cells that exceed thresholds.
- Chart.js bar charts for Reads PF, %Q30, Error Rate, and Cluster Density — with threshold lines drawn as horizontal rules so you can visually spot lanes that dip below or spike above the cutoff.
- Detailed flag list below the charts showing every failure and warning with lane number, metric name, observed value, threshold, and a plain-English explanation.
- Export button that generates a printable summary report with charts, flags, and a timestamp.
Everything runs client-side. The CSV data never leaves the browser. You can use this on an air-gapped instrument workstation.
The prompt
Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu.
Full lesson →
, navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project".
Full lesson →
, start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands.
Full lesson →
(e.g., by typing claude), and paste this prompt:
Build a single self-contained HTML file called sequencing-qc-triage.html thattriages Illumina sequencing run quality. Requirements:
1. FILE INPUT - A drag-and-drop zone (dashed border, changes color on dragover) for CSV files - Also a click-to-browse fallback button - Parse the CSV client-side (handle quoted fields, commas inside quotes) - Show the filename and row count after upload
2. SAMPLE DATA (embed as a "Load Example" button) Include this sample CSV data with deliberate QC failures for testing: Run_ID,Date,Instrument,Flowcell_Type,Lane,Reads_PF_M,Percent_Q30,Mean_Quality,Cluster_Density_K_mm2,Percent_Aligned,Error_Rate,Phasing,Prephasing RUN_20260315_A,2026-03-15,NovaSeq6000,S4,1,312.5,92.1,35.4,215,95.2,0.68,0.12,0.08 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,2,298.7,88.5,33.2,198,93.8,0.85,0.14,0.09 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,3,145.2,74.3,28.1,87,88.1,2.45,0.31,0.22 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,4,305.1,91.7,34.8,225,94.5,0.72,0.13,0.07 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,5,,89.2,32.5,201,92.3,0.91,0.15,0.10 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,6,287.3,79.1,29.8,188,91.0,1.95,0.28,0.19 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,7,320.8,93.5,36.1,245,96.1,0.55,0.10,0.06 RUN_20260315_A,2026-03-15,NovaSeq6000,S4,8,15.4,65.2,24.3,412,78.5,3.80,0.45,0.38 RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,1,48.2,91.8,35.0,850,94.0,0.70,0.11,0.07 RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,2,52.1,85.3,31.5,920,90.2,1.10,0.18,0.12 RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,3,18.7,72.6,27.4,380,85.3,2.82,0.35,0.25 RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,4,55.3,90.2,34.1,890,93.7,0.78,0.12,0.08 RUN_20260319_C,2026-03-19,NovaSeq,SP,1,410.5,94.8,36.5,280,97.2,0.42,0.09,0.05 RUN_20260319_C,2026-03-19,NovaSeq,SP,2,385.2,60.1,22.8,380,70.5,4.15,0.52,0.41 RUN_20260319_C,2026-03-19,NovaSeq,SP,3,395.0,93.1,35.8,265,96.0,0.50,0.10,0.06
3. QC THRESHOLDS (apply all of these, with instrument-aware logic) - Reads_PF_M: FAIL if < 200M for NovaSeq, FAIL if < 30M for NextSeq - Percent_Q30: FAIL if < 80% - Mean_Quality: FAIL if < 30 - Error_Rate: FAIL if > 2.0% - Cluster_Density_K_mm2: instrument-aware — NovaSeq/SP/S4: FAIL if < 100 or > 350; NextSeq: FAIL if < 600 or > 1200 - Missing values: flag as "DATA MISSING" in red - Phasing: WARN if > 0.25% - Prephasing: WARN if > 0.20% - Show which specific threshold was violated for each flagged cell
4. DASHBOARD OUTPUT - Run summary header: group rows by Run_ID, show instrument, flowcell, date, and overall status (PASS if all lanes pass, FAIL if any lane fails) - Metrics table: one row per lane, all columns from the CSV, with color-coded badges on failing cells (red = FAIL, yellow = WARN, green = PASS) - Below the table, a detailed flag list: "Lane 3: Percent_Q30 = 74.3% (threshold: ≥80%) — FAIL" - Clicking a flag in the list scrolls to and briefly highlights that row in the table
5. CHARTS (use Chart.js from CDN: https://cdn.jsdelivr.net/npm/chart.js) - Bar chart: Reads_PF_M per lane, with a horizontal threshold line at 200M (or 30M for NextSeq) - Bar chart: Percent_Q30 per lane, with a horizontal threshold line at 80% - Bar chart: Error_Rate per lane, with a horizontal threshold line at 2.0% - Bar chart: Cluster_Density_K_mm2 per lane, with instrument-aware threshold bands (NovaSeq: 100-350, NextSeq: 600-1200) - Color bars green if passing, red if failing - Group charts by run when multiple Run_IDs are present - Each chart should be sized to fit comfortably in a card layout
6. EXPORT - "Export Report" button that opens a new window with a print-friendly version of the QC report (white background, charts included, flags listed, includes filename and timestamp)
7. DESIGN - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #10b981 - Clean sans-serif font (Inter from Google Fonts CDN) - Responsive layout, cards for each chart - Drag zone should be prominent with a file icon and "Drop Run Stats CSV here" text - Green/red/yellow color coding consistent throughout - PASS badges in green, FAIL badges in red, WARN badges in amber/yellow
8. TECHNICAL - Pure HTML/CSS/JS in one file, no build step - Only external dependencies: Google Fonts (Inter) and Chart.js from CDN - CSV parser must handle quoted fields correctly - Charts must render after data loads (use Chart.js destroy/recreate pattern)That entire block is the prompt. Paste it as-is. The embedded sample data has deliberate QC failures in lanes 3, 5, 6, 8 of Run A, lanes 1, 3 of Run B, and lane 2 of Run C — so you can immediately verify the dashboard is catching them all.
What you get
After the LLM finishes (typically 60-90 seconds), you will have a single file: sequencing-qc-triage.html. Open it in any browser.
Expected output structure
sequencing-qc-triage.html (~600-900 lines)Click Load Example and you should see:
- Three run groups: RUN_20260315_A (8 lanes, NovaSeq 6000 S4), RUN_20260318_B (4 lanes, NextSeq 550), and RUN_20260319_C (3 lanes, NovaSeq SP).
- Run A, Lane 3 flagged FAIL: Reads_PF_M = 145.2 (below 200M NovaSeq threshold), Percent_Q30 = 74.3% (below 80%), Mean_Quality = 28.1 (below 30), Error_Rate = 2.45% (above 2.0%), Cluster_Density = 87 (below 100).
- Run A, Lane 5 flagged: Reads_PF_M is missing (DATA MISSING).
- Run A, Lane 6 flagged FAIL: Percent_Q30 = 79.1% (below 80%), Mean_Quality = 29.8 (below 30). WARN: Error_Rate = 1.95% (close but passing), Phasing = 0.28%.
- Run A, Lane 8 flagged FAIL: Reads_PF_M = 15.4 (far below 200M), Percent_Q30 = 65.2%, Mean_Quality = 24.3, Cluster_Density = 412 (above 350 — over-clustered), Error_Rate = 3.80%.
- Run B, Lane 3 flagged FAIL: Reads_PF_M = 18.7 (below 30M NextSeq threshold), Percent_Q30 = 72.6% (below 80%), Cluster_Density = 380 (below NextSeq range 600-1200).
- Run C, Lane 2 flagged FAIL: Percent_Q30 = 60.1%, Mean_Quality = 22.8, Cluster_Density = 380 (above 350), Error_Rate = 4.15%.
- Bar charts showing visual outliers — Lane 8 of Run A should be immediately obvious as a near-total failure across every metric.
- Passing lanes (Run A lanes 1, 2, 4, 7; Run B lanes 1, 2, 4; Run C lanes 1, 3) shown with green badges and green chart bars.
Illumina instruments generate binary InterOp files, not CSVs. Most facilities export run metrics to CSV from BaseSpace, SAV (Sequencing Analysis Viewer), or a LIMS. If your facility exports differently, just adjust the column names in the prompt to match your export format. The threshold logic stays the same.
If something is off
LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:
| Problem | Follow-up prompt |
|---|---|
| Charts don’t render | The Chart.js charts are not appearing. Make sure Chart.js is loaded from the CDN before the script runs. Use window.onload or DOMContentLoaded to delay chart creation until the library is available. |
| Threshold lines missing from charts | The horizontal threshold lines are not showing on the bar charts. Use the Chart.js annotation plugin or draw horizontal lines using a second dataset with type 'line' overlaid on the bar chart. |
| All lanes show the same threshold | NextSeq lanes are using the 200M NovaSeq threshold for Reads_PF_M instead of 30M. Make sure the threshold logic checks the Instrument column and applies the correct cutoff per instrument type. |
| Missing values not flagged | Lane 5 has a blank Reads_PF_M but it's not flagged. Add a check for empty strings, null, undefined, and NaN before comparing against thresholds. |
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works (the 2-minute explanation)
You do not need to read every line of the generated code, but here is the mental model:
- CSV parsing splits each line by commas, respecting quoted fields. The first row becomes the header, and every subsequent row becomes an object with properties like
Reads_PF_M,Percent_Q30, etc. Rows are grouped byRun_IDto support multi-run uploads. - Threshold engine iterates over every lane and checks each metric against its pass/fail cutoff. It is instrument-aware: when evaluating
Reads_PF_M, it reads theInstrumentcolumn to pick the right threshold (200M for NovaSeq, 30M for NextSeq). Cluster density thresholds also vary by instrument (NovaSeq patterned flowcells: 100-350 K/mm², NextSeq random clustering: 600-1200 K/mm²). Missing values are caught before numeric comparison. - Chart rendering creates four Chart.js bar charts. Each bar is colored green or red based on whether that lane’s metric passes or fails. Threshold lines are drawn as horizontal annotations so you can visually see where the cutoff sits relative to the data.
- Export converts each Chart.js canvas to a PNG data URL, then assembles a new HTML document with the summary table, flag list, and chart images. The data never goes to a server — it stays in your browser.
This dashboard handles the first step — identifying which lanes failed. The next step is notifying PIs and scheduling re-runs. Consider pairing this with the email draft customization below: upload the CSV, review the flags, click “Generate Email,” and send the PI a summary within minutes of the run completing. If you keep the exported reports in a shared folder, you build a QC history archive for free — useful for instrument maintenance reviews and grant reporting.
Customize it
The base dashboard handles standard Illumina metrics, but every facility has unique workflows. Each of these is a single follow-up prompt:
Add facility-specific thresholds
Update the QC thresholds to match our facility's standards:- Reads_PF_M: FAIL if < 250M for NovaSeq S4, < 150M for NovaSeq SP, < 30M for NextSeq Mid, < 65M for NextSeq High- Percent_Q30: FAIL if < 85% for RNA-seq, < 75% for amplicon sequencing- Add a dropdown at the top to select the application type (RNA-seq, WGS, Amplicon, ChIP-seq) and apply the appropriate thresholds for that application.Add multi-run trend comparison
Add a "Run History" section below the charts. When I upload multiple CSV files(one per run), plot a line chart showing Percent_Q30 and Error_Rate over time,one point per run (averaged across lanes). Store previous uploads inlocalStorage so the trend builds over sessions. Add a "Clear History" button.Add PI notification email draft
Add a "Generate Email" button that creates a pre-formatted email summary of theQC results for failed lanes. Include: run ID, date, instrument, which lanesfailed and why, and a recommendation (re-run vs. proceed with caution).Format it so I can copy-paste it into Outlook as a notification to the PI.Keep the tone professional: "Lane 3 did not meet the Q30 threshold (74.3% vs.80% minimum). We recommend re-sequencing this lane."Add PhiX spike-in and index balance checks
Add two new metrics to the threshold checks:- Percent_Aligned (PhiX spike-in): WARN if < 85%, FAIL if < 70%- Index balance: if a run has multiple lanes with different Reads_PF_M values, calculate the coefficient of variation across lanes. WARN if CV > 20%, FAIL if CV > 40%. Display the CV in the run summary header.Add a new chart showing Percent_Aligned per lane with the 85% threshold line.Start with the working dashboard, then add your facility’s specific thresholds and workflows one prompt at a time. Each prompt builds on what exists. You never need to plan the entire tool upfront — iterate from a solid foundation.
Try it yourself
- Open your CLI tool in an empty folder.
- Paste the main prompt from above.
- Open the generated
sequencing-qc-triage.htmlin your browser. - Click Load Example to see the QC triage on the embedded test data — verify that lanes 3, 5, 6, and 8 of Run A are flagged.
- Export a real run stats CSV from BaseSpace, SAV, or your LIMS and drop it on the dashboard to see your own data triaged.
- Pick one customization from the list above and add it.
If you manage a sequencing facility, put this HTML file on the instrument workstation desktop. Morning QC triage becomes a 30-second upload instead of a 15-minute spreadsheet review.
Key takeaways
- One prompt, one dashboard: a detailed prompt with embedded sample data and explicit thresholds produces a working QC triage tool in under 2 minutes.
- Instrument-aware thresholds are critical — a NextSeq lane at 18M reads is a failure, but a NovaSeq lane at 200M is fine. Cluster density ranges also differ by platform. Specifying this in the prompt prevents a common logic bug.
- Embedding test data with deliberate failures in the prompt guarantees you can verify every threshold check works immediately, without needing to wait for a real run to finish.
- Chart.js bar charts with threshold lines turn a wall of numbers into an instant visual scan — a red bar below the green line jumps out faster than a number in a table ever will.
- Client-side processing means sensitive run data never leaves the instrument workstation — important for facilities with data governance requirements.
Lane 8 of Run A shows a cluster density of 412 K/mm² and an error rate of 3.80%. What likely happened during library loading for this lane?
Run B uses a NextSeq 550 with a 30M reads threshold, while Run A uses a NovaSeq 6000 with a 200M threshold. Why is it important that the dashboard applies different thresholds per instrument?
What’s next
In the next lesson, you will build a CRISPR Editing Outcome Analyzer that parses amplicon sequencing results and classifies editing outcomes — insertions, deletions, wild-type — into a visual summary for experiment reporting.