Applied Module 12 · AI-Powered Bioinformatics Tools

CRISPR Editing Outcome Analyzer

What you'll learn

~25 min
  • Build a CRISPR editing outcome analyzer with a single AI prompt
  • Parse amplicon sequencing CSV data and compute indel frequencies with Chart.js visualization
  • Troubleshoot common issues with CSV parsing, frequency calculations, and chart rendering
  • Customize the analyzer with allele classification, batch comparison, or exportable reports

What you’re building

After a CRISPR experiment, the critical question is always the same: did the edit work? You harvest cells or embryos, PCR-amplify across the target site, send the amplicons for next-generation sequencing, and get back a CSV from CRISPResso2 or a similar analysis pipeline. That CSV contains the numbers you need — total reads, modified reads, NHEJ counts, HDR counts, allele frequencies — but turning those numbers into a clear picture of editing efficiency still means opening Excel, building formulas, and formatting charts by hand. For a single guide, that is manageable. For a screen with a dozen guides across multiple samples, it is tedious and error-prone.

You are going to build a tool that does this analysis instantly.

💬This closes the genome editing loop

This lesson connects directly to what you have already built in this track. In L2 you designed guide RNAs, in L9 you planned a breeding colony to propagate edited alleles, and now you are assessing whether those edits actually took. Design, breed, assess — that is the full CRISPR workflow, and this tool handles the final step: turning raw sequencing numbers into a clear answer about editing efficiency.

By the end of this lesson you will have a standalone CRISPR editing outcome analyzer that runs entirely in the browser. Upload a CSV of amplicon sequencing results (the kind CRISPResso2 exports), and it instantly computes indel frequencies, flags problematic samples, and renders Chart.js bar charts showing editing efficiency across all your guides. No server, no database, no installation — just one HTML file you can open on any lab workstation.

Software pattern: Upload, compute, visualize

Upload CSV → parse and compute frequencies → render interactive charts. This is the same data-pipeline pattern used in dashboards across every field. The techniques here transfer to any workflow where you need to turn tabular data into visual summaries.

🔍Domain Primer: Key terms you'll see in this lesson

New to CRISPR editing analysis? Here are the terms you will encounter:

  • Indel (insertion/deletion) — A small insertion or deletion of nucleotides at the CRISPR cut site. Indels in a coding region usually cause frameshifts that knock out the gene. This is the most common editing outcome.
  • Knock-out efficiency — The percentage of alleles in a sample that carry a disruptive edit (typically an indel). Higher is better when your goal is a gene knockout.
  • Allele frequency — How often a specific sequence variant appears in the pool of sequenced reads. A sample with 80% wild-type allele and 20% indel allele has 20% modification.
  • NHEJ (Non-Homologous End Joining) — The cell’s error-prone DNA repair pathway. After Cas9 cuts, NHEJ usually introduces indels. This is the default repair outcome and what most knockout experiments rely on.
  • HDR (Homology-Directed Repair) — A precise repair pathway that uses a donor template to insert a specific sequence at the cut site. HDR rates are typically much lower than NHEJ (often under 10%).
  • CRISPResso2 — A widely used computational pipeline that analyzes amplicon sequencing data from CRISPR experiments. It aligns reads to the reference, classifies them as modified or unmodified, and outputs summary statistics as CSV files.
  • Amplicon sequencing — Targeted next-generation sequencing of a specific PCR-amplified region (the amplicon) around the CRISPR cut site. Produces thousands of reads per sample, giving statistical power to measure editing frequency.

You do not need to memorize these — the tool handles the calculations. You just need to know what the numbers represent.

Who this is for

  • Genome editing lab techs (GEAM and similar facilities) who run CRISPResso2 on amplicon data and need a fast visual summary of editing outcomes across multiple guides and samples.
  • PI and postdoc researchers who want to quickly assess which guides worked, which samples have suspiciously low read counts, and whether HDR is outcompeting NHEJ.
  • Core facility staff who prepare editing efficiency reports for investigators and want a consistent, shareable format instead of ad hoc Excel charts.
GEAM Context

The UW-Madison Genome Editing and Animal Models (GEAM) facility generates these data routinely for investigators. A tool that turns CRISPResso2 CSV output into a visual report saves hours of post-processing per experiment and gives consistent formatting across all projects.


The showcase

Here is what the finished analyzer looks like once you open the HTML file in a browser:

  • Upload zone at the top where you drop a CSV file (or click to browse). Visual feedback on dragover.
  • Summary cards showing total samples analyzed, average modification rate, average NHEJ rate, average HDR rate, and number of flagged samples.
  • Stacked bar chart (Chart.js) showing NHEJ, HDR, and unmodified percentages for every sample, grouped by guide RNA.
  • Allele frequency chart showing top allele frequency vs. reference allele frequency per sample.
  • Data table with color-coded rows:
    • Green left border: modification rate above 50% (strong editing).
    • Yellow left border: modification rate between 10-50% (moderate editing).
    • Red left border: modification rate below 10%, suspicious read counts, or missing data.
  • Flag panel listing every quality concern: low read counts, missing values, HDR reported without NHEJ, reads that do not sum correctly.
  • Export button that downloads a print-friendly HTML report with charts and table.

Everything runs client-side. The sequencing data never leaves the browser.


The prompt

Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu. Full lesson → , navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project". Full lesson → , start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands. Full lesson → (e.g., by typing claude), and paste this prompt:

Build a single self-contained HTML file called editing-analyzer.html that analyzes
CRISPR editing outcomes from amplicon sequencing data. Requirements:
1. FILE INPUT
- A drag-and-drop zone (dashed border, changes color on dragover) for CSV files
- Also a click-to-browse fallback button
- Parse the CSV client-side (handle quoted fields, commas inside quotes)
- Show the filename and row count after upload
2. SAMPLE DATA (embed as a "Load Example" button)
Include this CSV data representing CRISPResso2-style amplicon sequencing output:
Sample_ID,Guide_Name,Target_Gene,Total_Reads,Modified_Reads,Unmodified_Reads,NHEJ_Reads,HDR_Reads,Mixed_Reads,Percent_Modified,Percent_NHEJ,Percent_HDR,Top_Allele_Frequency,Reference_Allele_Frequency
GEAM-001,sg-Rosa26-A,Rosa26,48210,42105,6105,38920,2815,370,87.3,80.7,5.8,62.4,12.7
GEAM-002,sg-Rosa26-A,Rosa26,51030,44394,6636,41200,2750,444,87.0,80.7,5.4,59.8,13.0
GEAM-003,sg-Rosa26-B,Rosa26,44890,12130,32760,11450,580,100,27.0,25.5,1.3,68.2,73.0
GEAM-004,sg-Rosa26-B,Rosa26,2150,610,1540,590,12,8,28.4,27.4,0.6,71.0,71.6
GEAM-005,sg-Trp53-A,Trp53,53200,49980,3220,48100,1620,260,94.0,90.4,3.0,55.1,6.1
GEAM-006,sg-Trp53-A,Trp53,49870,46879,2991,45200,1400,279,94.0,90.6,2.8,53.8,6.0
GEAM-007,sg-Trp53-B,Trp53,380,95,285,88,4,3,25.0,23.2,1.1,74.5,75.0
GEAM-008,sg-Cd9-A,Cd9,46500,4185,42315,3950,180,55,9.0,8.5,0.4,88.3,91.0
GEAM-009,sg-Cd9-A,Cd9,47200,3776,43424,3590,140,46,8.0,7.6,0.3,89.5,92.0
GEAM-010,sg-Cd9-B,Cd9,50100,45591,4509,28060,17100,431,91.0,56.0,34.1,31.2,9.0
GEAM-011,sg-Nras-A,Nras,44300,39870,4430,,,,90.0,,,52.4,10.0
GEAM-012,sg-Nras-A,Nras,45600,13680,31920,12900,650,130,30.0,28.3,1.4,65.8,70.0
GEAM-013,sg-Ctnnb1-A,Ctnnb1,51200,46592,4608,44300,1980,312,91.0,86.5,3.9,48.7,9.0
GEAM-014,sg-Ctnnb1-A,Ctnnb1,48900,43132,5768,22000,20800,332,88.2,45.0,42.5,28.5,11.8
GEAM-015,sg-Pten-A,Pten,47800,3346,44454,3100,200,46,7.0,6.5,0.4,90.1,93.0
3. ANALYSIS & FLAGGING RULES
- Compute modification rate: Modified_Reads / Total_Reads * 100
- Cross-check: Percent_Modified should match computed value within 1% tolerance
- Flag low read count: Total_Reads below 5000 (likely failed library or sequencing)
- Flag low modification: Percent_Modified below 10% (guide may be ineffective)
- Flag missing data: any blank NHEJ/HDR/Mixed cells (incomplete CRISPResso2 run)
- Flag suspicious HDR: Percent_HDR above 30% is unusual without a donor template
- Flag read count mismatch: if NHEJ + HDR + Mixed + Unmodified does not equal Total Reads within 2%
- Show all flags in a dedicated panel below the charts
4. CHARTS (use Chart.js from CDN)
- Chart 1: Horizontal stacked bar chart grouped by Sample_ID.
Each bar shows the proportion of NHEJ (red-orange), HDR (blue), Mixed (gray),
and Unmodified (light green). X-axis = percentage (0-100%).
- Chart 2: Grouped bar chart showing Top_Allele_Frequency vs Reference_Allele_Frequency
per sample. This reveals whether editing produced a dominant allele or scattered indels.
- Both charts should be responsive, have tooltips, and use readable labels.
5. DATA TABLE
- Show all rows with columns: Sample_ID, Guide_Name, Target_Gene, Total_Reads,
%Modified, %NHEJ, %HDR, Top_Allele_Freq, Ref_Allele_Freq
- Color-coded left borders: green (>50% modified), yellow (10-50%), red (<10%)
- Highlight flagged cells in red background with a tooltip explaining the flag
- Sortable by clicking column headers
6. SUMMARY CARDS
- Total samples analyzed
- Average modification rate (with min/max range)
- Average NHEJ rate
- Average HDR rate
- Samples flagged (count and percentage)
7. EXPORT
- "Export Report" button that opens a new window with a print-friendly version
including all charts (as static images via canvas.toDataURL), the data table,
the flag list, and a timestamp
8. DESIGN
- Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #10b981
- Clean sans-serif font (Inter from Google Fonts CDN)
- Responsive single-column layout
- Drag zone should be prominent with a file icon and "Drop CSV here" text
- Green/yellow/red color coding consistent throughout
9. TECHNICAL
- Pure HTML/CSS/JS in one file, no build step
- Chart.js from CDN: https://cdn.jsdelivr.net/npm/chart.js
- Google Fonts for Inter
- CSV parser must handle quoted fields correctly
💡Copy-paste ready

That entire block is the prompt. Paste it as-is. The embedded sample data has deliberate issues: GEAM-004 and GEAM-007 have very low read counts, GEAM-008 and GEAM-009 have low modification rates, GEAM-011 has missing NHEJ/HDR/Mixed values, GEAM-010 and GEAM-014 have suspiciously high HDR, and GEAM-015 has a weak guide. You can verify the analyzer catches all of these immediately.


What you get

After the LLM finishes (typically 60-90 seconds), you will have a single file: editing-analyzer.html. Open it in any browser.

Expected output structure

editing-analyzer.html (~600-900 lines)

Click Load Example and you should see:

  1. Summary cards showing 15 samples, average modification rate around 55-60%, and 6-8 flagged samples.
  2. GEAM-004 flagged: Total_Reads is 2,150 — well below the 5,000 threshold. Likely a failed library prep.
  3. GEAM-007 flagged: Total_Reads is only 380 — this sample essentially failed sequencing.
  4. GEAM-008 and GEAM-009 flagged: Percent_Modified below 10%. The sg-Cd9-A guide appears ineffective.
  5. GEAM-010 flagged: Percent_HDR is 34.1% — unusually high without an annotated donor template.
  6. GEAM-011 flagged: NHEJ_Reads, HDR_Reads, and Mixed_Reads are all blank. Incomplete analysis run.
  7. GEAM-014 flagged: Percent_HDR is 42.5% — suspiciously high, warrants investigation.
  8. GEAM-015 flagged: Percent_Modified is only 7.0%. The sg-Pten-A guide is not cutting efficiently.
  9. The stacked bar chart should clearly show that sg-Trp53-A guides (GEAM-005, GEAM-006) have the highest editing rates, while sg-Cd9-A guides (GEAM-008, GEAM-009) are barely editing.
  10. The allele frequency chart should show that highly edited samples (like GEAM-005) have a low reference allele frequency, while poorly edited samples (like GEAM-015) retain a high reference allele frequency.
What about CRISPResso2 output format?

CRISPResso2 produces several output files. The CSV format used here represents a consolidated summary — the kind you would get by combining data from multiple CRISPResso_quantification_of_editing_frequency.txt files into a single spreadsheet. If your pipeline outputs a different format, adjust the column names in the prompt to match.

If something is off

LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:

ProblemFollow-up prompt
Charts render but bars are not stackedThe stacked bar chart is showing separate bars instead of stacking NHEJ, HDR, Mixed, and Unmodified on top of each other. Set stacked: true on both the x and y axes in the Chart.js options.
Percentage values display as decimalsThe modification percentages are showing as 0.87 instead of 87.0. The CSV already contains percentage values, not fractions -- don't multiply by 100 again.
Flagged samples are not highlighted in the tableThe data table rows are all the same color. Apply the color-coded left borders based on modification rate: green for >50%, yellow for 10-50%, red for <10%. Also highlight individual flagged cells with a red background.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom
Charts are blank or do not render at all
Evidence
The page loads and the data table appears, but the chart areas are empty white rectangles with no bars or labels
What to ask the AI
"The Chart.js canvas elements are not rendering. Make sure you are creating the Chart instances after the data is parsed and the canvas elements exist in the DOM. Also verify the Chart.js CDN link is loading correctly -- check the browser console for 404 errors. If the canvas has zero height, set explicit height in CSS (e.g., min-height: 400px on the canvas container)."
Symptom
Missing data rows cause the entire table to break
Evidence
When GEAM-011 has blank NHEJ/HDR/Mixed columns, the table either skips the row entirely or shows NaN in every subsequent cell
What to ask the AI
"The CSV parser is not handling empty fields correctly. When a field is blank, treat it as null rather than 0 or NaN. In the data table, show a dash '-' for missing values. In the charts, skip samples with missing data or plot them with only the available values. In the flags panel, note which fields are missing."
Symptom
Export report shows broken chart images
Evidence
Clicking Export Report opens a new window but the charts appear as broken image icons or are completely missing
What to ask the AI
"The chart export needs to convert each canvas to a data URL before opening the new window. Use chart.toBase64Image() or canvas.toDataURL('image/png') for each chart, then insert the resulting base64 strings as <img src='...'> tags in the exported HTML. Make sure you call this before document.write on the new window."
Symptom
Read count mismatch flag triggers on every row
Evidence
Every single sample is flagged for read count mismatch, even rows where the numbers clearly add up correctly
What to ask the AI
"The read count validation is too strict. It should check whether NHEJ_Reads + HDR_Reads + Mixed_Reads + Unmodified_Reads equals Total_Reads within a 2% tolerance of Total_Reads. Make sure you are parsing the values as numbers (not strings) and skipping the check for rows with missing values."

How it works (the 2-minute explanation)

You do not need to read every line of the generated code, but here is the mental model:

  1. CSV parsing splits each line by commas (respecting quoted fields), uses the first row as column headers, and converts each subsequent row into a JavaScript object. Numeric columns are parsed as floats so math operations work correctly.
  2. Analysis engine iterates through every sample and applies the flagging rules: checking read counts against the 5,000 threshold, comparing computed vs. reported modification rates, looking for blank fields, and identifying suspicious HDR values. Each flag is stored as an object with the sample ID, the field that triggered it, and a human-readable explanation.
  3. Chart rendering creates two Chart.js instances. The stacked bar chart normalizes NHEJ + HDR + Mixed + Unmodified to 100% per sample. The allele frequency chart plots two bars per sample side by side. Both charts pull data from the same parsed array, so any filtering or sorting in the table is reflected.
  4. Export converts each Chart.js canvas to a PNG data URL using canvas.toDataURL(), then writes a new HTML document with those images, the data table, and the flag list. The result is a static, print-friendly page that can be saved as a PDF for lab notebooks.
🔍For GEAM Staff: Interpreting the allele frequency chart

The allele frequency chart reveals something the modification percentage alone does not: whether editing produced a single dominant allele or a scattered population of different indels. A sample with 90% modification but a top allele frequency of only 30% has many different indel variants — typical of NHEJ. A sample with 90% modification and a top allele frequency of 60% has a dominant editing outcome — potentially useful if you are trying to establish a specific allele in a mouse line. When planning colony expansion (the kind you designed in L9), you want high modification and a dominant allele, because that allele is what you will be genotyping for in subsequent generations.


Customize it

The base analyzer is useful as-is, but every editing experiment has unique needs. Each of these is a single follow-up prompt:

Add allele classification tiers

Add a classification column to the data table that categorizes each sample:
- "Strong KO" if Percent_Modified > 80% and Percent_NHEJ > 70%
- "Moderate KO" if Percent_Modified is 30-80%
- "Weak/Failed" if Percent_Modified < 30%
- "High HDR" if Percent_HDR > 15% (flag for donor template investigation)
Show the classification as a colored badge in the table and add a pie chart
showing the distribution of classifications across all samples.

Add batch comparison view

Add a "Compare Batches" feature. Let the user upload two CSV files (e.g.,
replicate 1 and replicate 2). Show a side-by-side bar chart comparing
Percent_Modified for each Guide_Name across the two batches. Highlight
guides where the modification rate differs by more than 15 percentage
points between replicates -- that level of variation suggests a technical
issue rather than biological variability.

Add guide ranking summary

Add a "Guide Ranking" section that groups all samples by Guide_Name and
computes the average modification rate, average HDR rate, and consistency
(standard deviation) across replicates for each guide. Rank guides from
most to least effective in a summary table. This helps decide which guide
to carry forward into animal model production.

Add downloadable CSV report

Add a "Download CSV Summary" button that exports a cleaned-up CSV with
columns: Sample_ID, Guide_Name, Target_Gene, Percent_Modified, Percent_NHEJ,
Percent_HDR, Classification, Flags. This is the format our facility uses
to report results back to investigators.
The customization loop

Start with the working analyzer, then add features one prompt at a time. Each prompt builds on what exists. The guide ranking summary is especially valuable — it turns raw sequencing data into a recommendation for which guide to use going forward.


Try it yourself

  1. Open your CLI tool in an empty folder.
  2. Paste the main prompt from above.
  3. Open the generated editing-analyzer.html in your browser.
  4. Click Load Example to see the analysis on the embedded test data.
  5. Check the flag panel — confirm it catches the low read counts (GEAM-004, GEAM-007), the ineffective guides (GEAM-008, GEAM-009, GEAM-015), and the suspicious HDR values (GEAM-010, GEAM-014).
  6. If you have real CRISPResso2 output, consolidate your results into a CSV matching the column format and drop it on the analyzer.
  7. Pick one customization from the list above and add it with a follow-up prompt.

If you work in a genome editing facility, bookmark this HTML file on the analysis workstation. The next time an investigator asks “did my edit work?”, you can hand them a visual report in under a minute instead of building one manually in Excel.


Key takeaways

  • One prompt, one tool: a detailed prompt with embedded sample data produces a working CRISPR editing analyzer in under 2 minutes.
  • Automated flagging catches what manual review misses: low read counts, missing data, and suspiciously high HDR rates are easy to overlook in a spreadsheet but impossible to miss when the tool highlights them in red.
  • The allele frequency chart adds insight beyond modification percentage: it reveals whether editing produced a dominant allele (useful for colony establishment) or scattered indels (typical NHEJ).
  • Embedding test data with known issues in the prompt guarantees you can verify the tool works immediately, without needing a separate test file.
  • This closes the CRISPR workflow loop: design guides (L2), plan the colony (L9), assess editing outcomes (this lesson). Each step is a single-prompt tool.

KNOWLEDGE CHECK

GEAM-007 has only 380 total reads while most other samples have 44,000-53,000. Why does the analyzer flag this sample rather than just reporting its modification percentage?

KNOWLEDGE CHECK

GEAM-010 shows 34.1% HDR and GEAM-014 shows 42.5% HDR. The analyzer flags both as suspicious. Why are high HDR rates a concern?


What’s next

In the next lesson, you will build a Proteomics Search Results Triage tool that parses mass spectrometry search engine output and highlights high-confidence protein identifications — moving from genomic editing analysis to proteomic data interpretation.