Applied Module 12 · AI-Powered Bioinformatics Tools

Sequencing Run QC Triage

What you'll learn

~35 min
  • Build a sequencing run QC triage dashboard with a single AI prompt
  • Parse Illumina run metrics CSV and flag lanes that fail quality thresholds using Chart.js
  • Troubleshoot common issues with CSV parsing, Chart.js rendering, and threshold logic
  • Customize the dashboard with facility-specific thresholds, multi-run comparison, or exportable reports

What you’re building

A sequencing run finishes at 2 AM. By the time the facility opens at 8, someone needs to know whether the data is usable or whether lanes need to be re-run. Right now that means opening the Illumina run folder, pulling up InterOp files or BaseSpace, scrolling through metrics tables, and mentally checking each lane against thresholds. It takes 15-20 minutes per run, and it is easy to miss a borderline lane when you are triaging three runs from the weekend.

You are going to build a tool that does this triage in under one second. Upload the run stats CSV, and the dashboard instantly flags every lane that fails QC thresholds — with color-coded pass/fail badges, Chart.js bar charts for visual comparison, and a downloadable summary you can email to the PI before they even ask.

💬This solves a real morning-after problem

Core facility staff who manage Illumina sequencers spend the first hour of every Monday triaging weekend runs. A dashboard that reads the CSV and flags failures immediately means you walk in, upload, and know in seconds which PIs need a heads-up and which runs are ready for downstream analysis. That hour becomes five minutes.

The finished tool is a standalone QC triage dashboard that runs entirely in the browser. Drop a run stats CSV onto the page, and it parses every lane, compares metrics against configurable thresholds, flags failures, and renders Chart.js bar charts so you can visually spot outliers. No server, no LIMS integration, no installation — one HTML file on the shared drive.

Software pattern: Upload, threshold-check, visualize

Upload → parse → compare against thresholds → flag outliers → chart results. This is the same pattern used in environmental monitoring dashboards, manufacturing QC systems, and clinical lab result flagging. The techniques transfer anywhere you need to check numbers against pass/fail criteria.

🔍Domain Primer: Key terms you'll see in this lesson

New to sequencing QC? Here are the terms you’ll encounter:

  • Reads PF (Passing Filter) — The number of sequencing reads that pass Illumina’s internal quality filter. Measured in millions (M). Low numbers mean the run underperformed and you may not have enough data for analysis.
  • %Q30 — The percentage of bases with a quality score of 30 or higher, meaning less than 1-in-1000 chance of error. Industry standard is above 80% for most applications. Below 80% means noisy data.
  • Mean Quality Score — The average Phred quality score across all bases in a lane. Should be above 30 for good runs. Scores below 30 indicate degraded chemistry or loading issues.
  • Cluster Density (K/mm²) — How many clusters formed per square millimeter on the flowcell. Too low means under-loading (wasted capacity). Too high means over-loading (clusters overlap, quality drops).
  • Error Rate (%) — The percentage of bases that differ from the PhiX control spike-in. Healthy runs show under 1%, and anything above 2% suggests a chemistry or imaging problem.
  • Phasing / Prephasing — The percentage of molecules in a cluster that fall behind (phasing) or jump ahead (prephasing) during sequencing cycles. High values indicate chemistry degradation. Typical values are under 0.25%.
  • Flowcell — The glass chip where sequencing happens. Different types (e.g., S4, SP, Nano) have different lane counts and expected output ranges.

You don’t need to memorize these — the dashboard handles the threshold logic. Just know that each metric has a “good” range, and values outside that range mean something went wrong.

Who this is for

  • Sequencing facility staff who triage completed runs every morning and need to quickly identify lanes that need re-sequencing or troubleshooting.
  • Genomics core directors who want a visual overview of run quality trends across instruments and need a report to send to PIs with failing lanes.
  • Bioinformaticians who receive data handoffs and want to verify upstream QC before investing compute time in alignment and variant calling.
Core Facility Context

UW-Madison’s Biotechnology Center DNA Sequencing Facility, the Genome Editing and Carrier Screening labs, and the Bioinformatics Resource Center all generate Illumina run data that needs daily QC review. Each facility may have slightly different thresholds, but the triage workflow is identical: check the numbers, flag the failures, notify the PI.


The showcase

Here is what the finished dashboard looks like once you open the HTML file in a browser:

  • Drag-and-drop zone at the top where you drop a run stats CSV file (or click to browse). Visual feedback on dragover.
  • Run summary header showing instrument name, flowcell type, run date, and overall pass/fail status.
  • Metrics table with every lane as a row, each metric in a column, and color-coded pass/fail badges on cells that exceed thresholds.
  • Chart.js bar charts for Reads PF, %Q30, Error Rate, and Cluster Density — with threshold lines drawn as horizontal rules so you can visually spot lanes that dip below or spike above the cutoff.
  • Detailed flag list below the charts showing every failure and warning with lane number, metric name, observed value, threshold, and a plain-English explanation.
  • Export button that generates a printable summary report with charts, flags, and a timestamp.

Everything runs client-side. The CSV data never leaves the browser. You can use this on an air-gapped instrument workstation.


The prompt

Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu. Full lesson → , navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project". Full lesson → , start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands. Full lesson → (e.g., by typing claude), and paste this prompt:

Build a single self-contained HTML file called sequencing-qc-triage.html that
triages Illumina sequencing run quality. Requirements:
1. FILE INPUT
- A drag-and-drop zone (dashed border, changes color on dragover) for CSV files
- Also a click-to-browse fallback button
- Parse the CSV client-side (handle quoted fields, commas inside quotes)
- Show the filename and row count after upload
2. SAMPLE DATA (embed as a "Load Example" button)
Include this sample CSV data with deliberate QC failures for testing:
Run_ID,Date,Instrument,Flowcell_Type,Lane,Reads_PF_M,Percent_Q30,Mean_Quality,Cluster_Density_K_mm2,Percent_Aligned,Error_Rate,Phasing,Prephasing
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,1,312.5,92.1,35.4,215,95.2,0.68,0.12,0.08
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,2,298.7,88.5,33.2,198,93.8,0.85,0.14,0.09
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,3,145.2,74.3,28.1,87,88.1,2.45,0.31,0.22
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,4,305.1,91.7,34.8,225,94.5,0.72,0.13,0.07
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,5,,89.2,32.5,201,92.3,0.91,0.15,0.10
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,6,287.3,79.1,29.8,188,91.0,1.95,0.28,0.19
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,7,320.8,93.5,36.1,245,96.1,0.55,0.10,0.06
RUN_20260315_A,2026-03-15,NovaSeq6000,S4,8,15.4,65.2,24.3,412,78.5,3.80,0.45,0.38
RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,1,48.2,91.8,35.0,850,94.0,0.70,0.11,0.07
RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,2,52.1,85.3,31.5,920,90.2,1.10,0.18,0.12
RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,3,18.7,72.6,27.4,380,85.3,2.82,0.35,0.25
RUN_20260318_B,2026-03-18,NextSeq550,Mid_Output,4,55.3,90.2,34.1,890,93.7,0.78,0.12,0.08
RUN_20260319_C,2026-03-19,NovaSeq,SP,1,410.5,94.8,36.5,280,97.2,0.42,0.09,0.05
RUN_20260319_C,2026-03-19,NovaSeq,SP,2,385.2,60.1,22.8,380,70.5,4.15,0.52,0.41
RUN_20260319_C,2026-03-19,NovaSeq,SP,3,395.0,93.1,35.8,265,96.0,0.50,0.10,0.06
3. QC THRESHOLDS (apply all of these, with instrument-aware logic)
- Reads_PF_M: FAIL if < 200M for NovaSeq, FAIL if < 30M for NextSeq
- Percent_Q30: FAIL if < 80%
- Mean_Quality: FAIL if < 30
- Error_Rate: FAIL if > 2.0%
- Cluster_Density_K_mm2: instrument-aware — NovaSeq/SP/S4: FAIL if < 100 or > 350; NextSeq: FAIL if < 600 or > 1200
- Missing values: flag as "DATA MISSING" in red
- Phasing: WARN if > 0.25%
- Prephasing: WARN if > 0.20%
- Show which specific threshold was violated for each flagged cell
4. DASHBOARD OUTPUT
- Run summary header: group rows by Run_ID, show instrument, flowcell, date,
and overall status (PASS if all lanes pass, FAIL if any lane fails)
- Metrics table: one row per lane, all columns from the CSV, with color-coded
badges on failing cells (red = FAIL, yellow = WARN, green = PASS)
- Below the table, a detailed flag list: "Lane 3: Percent_Q30 = 74.3% (threshold: ≥80%) — FAIL"
- Clicking a flag in the list scrolls to and briefly highlights that row in the table
5. CHARTS (use Chart.js from CDN: https://cdn.jsdelivr.net/npm/chart.js)
- Bar chart: Reads_PF_M per lane, with a horizontal threshold line at 200M (or 30M for NextSeq)
- Bar chart: Percent_Q30 per lane, with a horizontal threshold line at 80%
- Bar chart: Error_Rate per lane, with a horizontal threshold line at 2.0%
- Bar chart: Cluster_Density_K_mm2 per lane, with instrument-aware threshold bands (NovaSeq: 100-350, NextSeq: 600-1200)
- Color bars green if passing, red if failing
- Group charts by run when multiple Run_IDs are present
- Each chart should be sized to fit comfortably in a card layout
6. EXPORT
- "Export Report" button that opens a new window with a print-friendly version
of the QC report (white background, charts included, flags listed, includes
filename and timestamp)
7. DESIGN
- Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #10b981
- Clean sans-serif font (Inter from Google Fonts CDN)
- Responsive layout, cards for each chart
- Drag zone should be prominent with a file icon and "Drop Run Stats CSV here" text
- Green/red/yellow color coding consistent throughout
- PASS badges in green, FAIL badges in red, WARN badges in amber/yellow
8. TECHNICAL
- Pure HTML/CSS/JS in one file, no build step
- Only external dependencies: Google Fonts (Inter) and Chart.js from CDN
- CSV parser must handle quoted fields correctly
- Charts must render after data loads (use Chart.js destroy/recreate pattern)
💡Copy-paste ready

That entire block is the prompt. Paste it as-is. The embedded sample data has deliberate QC failures in lanes 3, 5, 6, 8 of Run A, lanes 1, 3 of Run B, and lane 2 of Run C — so you can immediately verify the dashboard is catching them all.


What you get

After the LLM finishes (typically 60-90 seconds), you will have a single file: sequencing-qc-triage.html. Open it in any browser.

Expected output structure

sequencing-qc-triage.html (~600-900 lines)

Click Load Example and you should see:

  1. Three run groups: RUN_20260315_A (8 lanes, NovaSeq 6000 S4), RUN_20260318_B (4 lanes, NextSeq 550), and RUN_20260319_C (3 lanes, NovaSeq SP).
  2. Run A, Lane 3 flagged FAIL: Reads_PF_M = 145.2 (below 200M NovaSeq threshold), Percent_Q30 = 74.3% (below 80%), Mean_Quality = 28.1 (below 30), Error_Rate = 2.45% (above 2.0%), Cluster_Density = 87 (below 100).
  3. Run A, Lane 5 flagged: Reads_PF_M is missing (DATA MISSING).
  4. Run A, Lane 6 flagged FAIL: Percent_Q30 = 79.1% (below 80%), Mean_Quality = 29.8 (below 30). WARN: Error_Rate = 1.95% (close but passing), Phasing = 0.28%.
  5. Run A, Lane 8 flagged FAIL: Reads_PF_M = 15.4 (far below 200M), Percent_Q30 = 65.2%, Mean_Quality = 24.3, Cluster_Density = 412 (above 350 — over-clustered), Error_Rate = 3.80%.
  6. Run B, Lane 3 flagged FAIL: Reads_PF_M = 18.7 (below 30M NextSeq threshold), Percent_Q30 = 72.6% (below 80%), Cluster_Density = 380 (below NextSeq range 600-1200).
  7. Run C, Lane 2 flagged FAIL: Percent_Q30 = 60.1%, Mean_Quality = 22.8, Cluster_Density = 380 (above 350), Error_Rate = 4.15%.
  8. Bar charts showing visual outliers — Lane 8 of Run A should be immediately obvious as a near-total failure across every metric.
  9. Passing lanes (Run A lanes 1, 2, 4, 7; Run B lanes 1, 2, 4; Run C lanes 1, 3) shown with green badges and green chart bars.
What about InterOp files?

Illumina instruments generate binary InterOp files, not CSVs. Most facilities export run metrics to CSV from BaseSpace, SAV (Sequencing Analysis Viewer), or a LIMS. If your facility exports differently, just adjust the column names in the prompt to match your export format. The threshold logic stays the same.

If something is off

LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:

ProblemFollow-up prompt
Charts don’t renderThe Chart.js charts are not appearing. Make sure Chart.js is loaded from the CDN before the script runs. Use window.onload or DOMContentLoaded to delay chart creation until the library is available.
Threshold lines missing from chartsThe horizontal threshold lines are not showing on the bar charts. Use the Chart.js annotation plugin or draw horizontal lines using a second dataset with type 'line' overlaid on the bar chart.
All lanes show the same thresholdNextSeq lanes are using the 200M NovaSeq threshold for Reads_PF_M instead of 30M. Make sure the threshold logic checks the Instrument column and applies the correct cutoff per instrument type.
Missing values not flaggedLane 5 has a blank Reads_PF_M but it's not flagged. Add a check for empty strings, null, undefined, and NaN before comparing against thresholds.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom
Chart.js throws 'Canvas is already in use' error and charts are blank
Evidence
The browser console shows 'Canvas is already in use. Chart with ID X must be destroyed before the Chart for this canvas can be created.' when loading new data
What to ask the AI
"The charts are not being destroyed before re-creating them. Before each new Chart() call, check if a chart instance already exists for that canvas and call .destroy() on it first. Store chart instances in a global object so you can reference them on reload."
Symptom
Cluster density threshold band does not render on the chart
Evidence
The cluster density chart shows bars but no shaded region between 100 and 350 K/mm² to indicate the acceptable range
What to ask the AI
"Add a shaded background band on the cluster density chart between y=100 and y=350 using a Chart.js box annotation or by drawing two horizontal line datasets at 100 and 350 with a fill between them. Use a semi-transparent green for the band so bars are still visible."
Symptom
Runs with different instruments show identical thresholds
Evidence
NextSeq lanes are flagged with the 200M reads threshold instead of 30M. The instrument column says NextSeq550 but the flag says 'below 200M'
What to ask the AI
"The threshold logic is not instrument-aware. When checking Reads_PF_M, read the Instrument column for that row. If Instrument contains 'NextSeq', use 30M as the threshold. For NovaSeq, use 200M. Also make Cluster_Density instrument-aware: NovaSeq 100-350, NextSeq 600-1200. Update both the flag messages and the chart threshold lines to reflect the correct per-instrument values."
Symptom
Export report is missing the charts
Evidence
Clicking 'Export Report' opens a new window with the table and flags but no chart images
What to ask the AI
"Chart.js renders to canvas elements which don't transfer to a new window via innerHTML. Convert each canvas to a data URL using canvas.toDataURL('image/png') and insert them as <img> tags in the export HTML. Do this for all four charts."

How it works (the 2-minute explanation)

You do not need to read every line of the generated code, but here is the mental model:

  1. CSV parsing splits each line by commas, respecting quoted fields. The first row becomes the header, and every subsequent row becomes an object with properties like Reads_PF_M, Percent_Q30, etc. Rows are grouped by Run_ID to support multi-run uploads.
  2. Threshold engine iterates over every lane and checks each metric against its pass/fail cutoff. It is instrument-aware: when evaluating Reads_PF_M, it reads the Instrument column to pick the right threshold (200M for NovaSeq, 30M for NextSeq). Cluster density thresholds also vary by instrument (NovaSeq patterned flowcells: 100-350 K/mm², NextSeq random clustering: 600-1200 K/mm²). Missing values are caught before numeric comparison.
  3. Chart rendering creates four Chart.js bar charts. Each bar is colored green or red based on whether that lane’s metric passes or fails. Threshold lines are drawn as horizontal annotations so you can visually see where the cutoff sits relative to the data.
  4. Export converts each Chart.js canvas to a PNG data URL, then assembles a new HTML document with the summary table, flag list, and chart images. The data never goes to a server — it stays in your browser.
🔍For Facility Managers: Turning QC triage into a workflow

This dashboard handles the first step — identifying which lanes failed. The next step is notifying PIs and scheduling re-runs. Consider pairing this with the email draft customization below: upload the CSV, review the flags, click “Generate Email,” and send the PI a summary within minutes of the run completing. If you keep the exported reports in a shared folder, you build a QC history archive for free — useful for instrument maintenance reviews and grant reporting.


Customize it

The base dashboard handles standard Illumina metrics, but every facility has unique workflows. Each of these is a single follow-up prompt:

Add facility-specific thresholds

Update the QC thresholds to match our facility's standards:
- Reads_PF_M: FAIL if < 250M for NovaSeq S4, < 150M for NovaSeq SP, < 30M for NextSeq Mid, < 65M for NextSeq High
- Percent_Q30: FAIL if < 85% for RNA-seq, < 75% for amplicon sequencing
- Add a dropdown at the top to select the application type (RNA-seq, WGS, Amplicon, ChIP-seq)
and apply the appropriate thresholds for that application.

Add multi-run trend comparison

Add a "Run History" section below the charts. When I upload multiple CSV files
(one per run), plot a line chart showing Percent_Q30 and Error_Rate over time,
one point per run (averaged across lanes). Store previous uploads in
localStorage so the trend builds over sessions. Add a "Clear History" button.

Add PI notification email draft

Add a "Generate Email" button that creates a pre-formatted email summary of the
QC results for failed lanes. Include: run ID, date, instrument, which lanes
failed and why, and a recommendation (re-run vs. proceed with caution).
Format it so I can copy-paste it into Outlook as a notification to the PI.
Keep the tone professional: "Lane 3 did not meet the Q30 threshold (74.3% vs.
80% minimum). We recommend re-sequencing this lane."

Add PhiX spike-in and index balance checks

Add two new metrics to the threshold checks:
- Percent_Aligned (PhiX spike-in): WARN if < 85%, FAIL if < 70%
- Index balance: if a run has multiple lanes with different Reads_PF_M values,
calculate the coefficient of variation across lanes. WARN if CV > 20%,
FAIL if CV > 40%. Display the CV in the run summary header.
Add a new chart showing Percent_Aligned per lane with the 85% threshold line.
The customization loop

Start with the working dashboard, then add your facility’s specific thresholds and workflows one prompt at a time. Each prompt builds on what exists. You never need to plan the entire tool upfront — iterate from a solid foundation.


Try it yourself

  1. Open your CLI tool in an empty folder.
  2. Paste the main prompt from above.
  3. Open the generated sequencing-qc-triage.html in your browser.
  4. Click Load Example to see the QC triage on the embedded test data — verify that lanes 3, 5, 6, and 8 of Run A are flagged.
  5. Export a real run stats CSV from BaseSpace, SAV, or your LIMS and drop it on the dashboard to see your own data triaged.
  6. Pick one customization from the list above and add it.

If you manage a sequencing facility, put this HTML file on the instrument workstation desktop. Morning QC triage becomes a 30-second upload instead of a 15-minute spreadsheet review.


Key takeaways

  • One prompt, one dashboard: a detailed prompt with embedded sample data and explicit thresholds produces a working QC triage tool in under 2 minutes.
  • Instrument-aware thresholds are critical — a NextSeq lane at 18M reads is a failure, but a NovaSeq lane at 200M is fine. Cluster density ranges also differ by platform. Specifying this in the prompt prevents a common logic bug.
  • Embedding test data with deliberate failures in the prompt guarantees you can verify every threshold check works immediately, without needing to wait for a real run to finish.
  • Chart.js bar charts with threshold lines turn a wall of numbers into an instant visual scan — a red bar below the green line jumps out faster than a number in a table ever will.
  • Client-side processing means sensitive run data never leaves the instrument workstation — important for facilities with data governance requirements.

KNOWLEDGE CHECK

Lane 8 of Run A shows a cluster density of 412 K/mm² and an error rate of 3.80%. What likely happened during library loading for this lane?

KNOWLEDGE CHECK

Run B uses a NextSeq 550 with a 30M reads threshold, while Run A uses a NovaSeq 6000 with a 200M threshold. Why is it important that the dashboard applies different thresholds per instrument?


What’s next

In the next lesson, you will build a CRISPR Editing Outcome Analyzer that parses amplicon sequencing results and classifies editing outcomes — insertions, deletions, wild-type — into a visual summary for experiment reporting.