CRISPR Editing Outcome Analyzer
What you'll learn
~25 min- Build a CRISPR editing outcome analyzer with a single AI prompt
- Parse amplicon sequencing CSV data and compute indel frequencies with Chart.js visualization
- Troubleshoot common issues with CSV parsing, frequency calculations, and chart rendering
- Customize the analyzer with allele classification, batch comparison, or exportable reports
What you’re building
After a CRISPR experiment, the critical question is always the same: did the edit work? You harvest cells or embryos, PCR-amplify across the target site, send the amplicons for next-generation sequencing, and get back a CSV from CRISPResso2 or a similar analysis pipeline. That CSV contains the numbers you need — total reads, modified reads, NHEJ counts, HDR counts, allele frequencies — but turning those numbers into a clear picture of editing efficiency still means opening Excel, building formulas, and formatting charts by hand. For a single guide, that is manageable. For a screen with a dozen guides across multiple samples, it is tedious and error-prone.
You are going to build a tool that does this analysis instantly.
This lesson connects directly to what you have already built in this track. In L2 you designed guide RNAs, in L9 you planned a breeding colony to propagate edited alleles, and now you are assessing whether those edits actually took. Design, breed, assess — that is the full CRISPR workflow, and this tool handles the final step: turning raw sequencing numbers into a clear answer about editing efficiency.
By the end of this lesson you will have a standalone CRISPR editing outcome analyzer that runs entirely in the browser. Upload a CSV of amplicon sequencing results (the kind CRISPResso2 exports), and it instantly computes indel frequencies, flags problematic samples, and renders Chart.js bar charts showing editing efficiency across all your guides. No server, no database, no installation — just one HTML file you can open on any lab workstation.
Upload CSV → parse and compute frequencies → render interactive charts. This is the same data-pipeline pattern used in dashboards across every field. The techniques here transfer to any workflow where you need to turn tabular data into visual summaries.
🔍Domain Primer: Key terms you'll see in this lesson
New to CRISPR editing analysis? Here are the terms you will encounter:
- Indel (insertion/deletion) — A small insertion or deletion of nucleotides at the CRISPR cut site. Indels in a coding region usually cause frameshifts that knock out the gene. This is the most common editing outcome.
- Knock-out efficiency — The percentage of alleles in a sample that carry a disruptive edit (typically an indel). Higher is better when your goal is a gene knockout.
- Allele frequency — How often a specific sequence variant appears in the pool of sequenced reads. A sample with 80% wild-type allele and 20% indel allele has 20% modification.
- NHEJ (Non-Homologous End Joining) — The cell’s error-prone DNA repair pathway. After Cas9 cuts, NHEJ usually introduces indels. This is the default repair outcome and what most knockout experiments rely on.
- HDR (Homology-Directed Repair) — A precise repair pathway that uses a donor template to insert a specific sequence at the cut site. HDR rates are typically much lower than NHEJ (often under 10%).
- CRISPResso2 — A widely used computational pipeline that analyzes amplicon sequencing data from CRISPR experiments. It aligns reads to the reference, classifies them as modified or unmodified, and outputs summary statistics as CSV files.
- Amplicon sequencing — Targeted next-generation sequencing of a specific PCR-amplified region (the amplicon) around the CRISPR cut site. Produces thousands of reads per sample, giving statistical power to measure editing frequency.
You do not need to memorize these — the tool handles the calculations. You just need to know what the numbers represent.
Who this is for
- Genome editing lab techs (GEAM and similar facilities) who run CRISPResso2 on amplicon data and need a fast visual summary of editing outcomes across multiple guides and samples.
- PI and postdoc researchers who want to quickly assess which guides worked, which samples have suspiciously low read counts, and whether HDR is outcompeting NHEJ.
- Core facility staff who prepare editing efficiency reports for investigators and want a consistent, shareable format instead of ad hoc Excel charts.
The UW-Madison Genome Editing and Animal Models (GEAM) facility generates these data routinely for investigators. A tool that turns CRISPResso2 CSV output into a visual report saves hours of post-processing per experiment and gives consistent formatting across all projects.
The showcase
Here is what the finished analyzer looks like once you open the HTML file in a browser:
- Upload zone at the top where you drop a CSV file (or click to browse). Visual feedback on dragover.
- Summary cards showing total samples analyzed, average modification rate, average NHEJ rate, average HDR rate, and number of flagged samples.
- Stacked bar chart (Chart.js) showing NHEJ, HDR, and unmodified percentages for every sample, grouped by guide RNA.
- Allele frequency chart showing top allele frequency vs. reference allele frequency per sample.
- Data table with color-coded rows:
- Green left border: modification rate above 50% (strong editing).
- Yellow left border: modification rate between 10-50% (moderate editing).
- Red left border: modification rate below 10%, suspicious read counts, or missing data.
- Flag panel listing every quality concern: low read counts, missing values, HDR reported without NHEJ, reads that do not sum correctly.
- Export button that downloads a print-friendly HTML report with charts and table.
Everything runs client-side. The sequencing data never leaves the browser.
The prompt
Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu.
Full lesson →
, navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project".
Full lesson →
, start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands.
Full lesson →
(e.g., by typing claude), and paste this prompt:
Build a single self-contained HTML file called editing-analyzer.html that analyzesCRISPR editing outcomes from amplicon sequencing data. Requirements:
1. FILE INPUT - A drag-and-drop zone (dashed border, changes color on dragover) for CSV files - Also a click-to-browse fallback button - Parse the CSV client-side (handle quoted fields, commas inside quotes) - Show the filename and row count after upload
2. SAMPLE DATA (embed as a "Load Example" button) Include this CSV data representing CRISPResso2-style amplicon sequencing output: Sample_ID,Guide_Name,Target_Gene,Total_Reads,Modified_Reads,Unmodified_Reads,NHEJ_Reads,HDR_Reads,Mixed_Reads,Percent_Modified,Percent_NHEJ,Percent_HDR,Top_Allele_Frequency,Reference_Allele_Frequency GEAM-001,sg-Rosa26-A,Rosa26,48210,42105,6105,38920,2815,370,87.3,80.7,5.8,62.4,12.7 GEAM-002,sg-Rosa26-A,Rosa26,51030,44394,6636,41200,2750,444,87.0,80.7,5.4,59.8,13.0 GEAM-003,sg-Rosa26-B,Rosa26,44890,12130,32760,11450,580,100,27.0,25.5,1.3,68.2,73.0 GEAM-004,sg-Rosa26-B,Rosa26,2150,610,1540,590,12,8,28.4,27.4,0.6,71.0,71.6 GEAM-005,sg-Trp53-A,Trp53,53200,49980,3220,48100,1620,260,94.0,90.4,3.0,55.1,6.1 GEAM-006,sg-Trp53-A,Trp53,49870,46879,2991,45200,1400,279,94.0,90.6,2.8,53.8,6.0 GEAM-007,sg-Trp53-B,Trp53,380,95,285,88,4,3,25.0,23.2,1.1,74.5,75.0 GEAM-008,sg-Cd9-A,Cd9,46500,4185,42315,3950,180,55,9.0,8.5,0.4,88.3,91.0 GEAM-009,sg-Cd9-A,Cd9,47200,3776,43424,3590,140,46,8.0,7.6,0.3,89.5,92.0 GEAM-010,sg-Cd9-B,Cd9,50100,45591,4509,28060,17100,431,91.0,56.0,34.1,31.2,9.0 GEAM-011,sg-Nras-A,Nras,44300,39870,4430,,,,90.0,,,52.4,10.0 GEAM-012,sg-Nras-A,Nras,45600,13680,31920,12900,650,130,30.0,28.3,1.4,65.8,70.0 GEAM-013,sg-Ctnnb1-A,Ctnnb1,51200,46592,4608,44300,1980,312,91.0,86.5,3.9,48.7,9.0 GEAM-014,sg-Ctnnb1-A,Ctnnb1,48900,43132,5768,22000,20800,332,88.2,45.0,42.5,28.5,11.8 GEAM-015,sg-Pten-A,Pten,47800,3346,44454,3100,200,46,7.0,6.5,0.4,90.1,93.0
3. ANALYSIS & FLAGGING RULES - Compute modification rate: Modified_Reads / Total_Reads * 100 - Cross-check: Percent_Modified should match computed value within 1% tolerance - Flag low read count: Total_Reads below 5000 (likely failed library or sequencing) - Flag low modification: Percent_Modified below 10% (guide may be ineffective) - Flag missing data: any blank NHEJ/HDR/Mixed cells (incomplete CRISPResso2 run) - Flag suspicious HDR: Percent_HDR above 30% is unusual without a donor template - Flag read count mismatch: if NHEJ + HDR + Mixed + Unmodified does not equal Total Reads within 2% - Show all flags in a dedicated panel below the charts
4. CHARTS (use Chart.js from CDN) - Chart 1: Horizontal stacked bar chart grouped by Sample_ID. Each bar shows the proportion of NHEJ (red-orange), HDR (blue), Mixed (gray), and Unmodified (light green). X-axis = percentage (0-100%). - Chart 2: Grouped bar chart showing Top_Allele_Frequency vs Reference_Allele_Frequency per sample. This reveals whether editing produced a dominant allele or scattered indels. - Both charts should be responsive, have tooltips, and use readable labels.
5. DATA TABLE - Show all rows with columns: Sample_ID, Guide_Name, Target_Gene, Total_Reads, %Modified, %NHEJ, %HDR, Top_Allele_Freq, Ref_Allele_Freq - Color-coded left borders: green (>50% modified), yellow (10-50%), red (<10%) - Highlight flagged cells in red background with a tooltip explaining the flag - Sortable by clicking column headers
6. SUMMARY CARDS - Total samples analyzed - Average modification rate (with min/max range) - Average NHEJ rate - Average HDR rate - Samples flagged (count and percentage)
7. EXPORT - "Export Report" button that opens a new window with a print-friendly version including all charts (as static images via canvas.toDataURL), the data table, the flag list, and a timestamp
8. DESIGN - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #10b981 - Clean sans-serif font (Inter from Google Fonts CDN) - Responsive single-column layout - Drag zone should be prominent with a file icon and "Drop CSV here" text - Green/yellow/red color coding consistent throughout
9. TECHNICAL - Pure HTML/CSS/JS in one file, no build step - Chart.js from CDN: https://cdn.jsdelivr.net/npm/chart.js - Google Fonts for Inter - CSV parser must handle quoted fields correctlyThat entire block is the prompt. Paste it as-is. The embedded sample data has deliberate issues: GEAM-004 and GEAM-007 have very low read counts, GEAM-008 and GEAM-009 have low modification rates, GEAM-011 has missing NHEJ/HDR/Mixed values, GEAM-010 and GEAM-014 have suspiciously high HDR, and GEAM-015 has a weak guide. You can verify the analyzer catches all of these immediately.
What you get
After the LLM finishes (typically 60-90 seconds), you will have a single file: editing-analyzer.html. Open it in any browser.
Expected output structure
editing-analyzer.html (~600-900 lines)Click Load Example and you should see:
- Summary cards showing 15 samples, average modification rate around 55-60%, and 6-8 flagged samples.
- GEAM-004 flagged: Total_Reads is 2,150 — well below the 5,000 threshold. Likely a failed library prep.
- GEAM-007 flagged: Total_Reads is only 380 — this sample essentially failed sequencing.
- GEAM-008 and GEAM-009 flagged: Percent_Modified below 10%. The sg-Cd9-A guide appears ineffective.
- GEAM-010 flagged: Percent_HDR is 34.1% — unusually high without an annotated donor template.
- GEAM-011 flagged: NHEJ_Reads, HDR_Reads, and Mixed_Reads are all blank. Incomplete analysis run.
- GEAM-014 flagged: Percent_HDR is 42.5% — suspiciously high, warrants investigation.
- GEAM-015 flagged: Percent_Modified is only 7.0%. The sg-Pten-A guide is not cutting efficiently.
- The stacked bar chart should clearly show that sg-Trp53-A guides (GEAM-005, GEAM-006) have the highest editing rates, while sg-Cd9-A guides (GEAM-008, GEAM-009) are barely editing.
- The allele frequency chart should show that highly edited samples (like GEAM-005) have a low reference allele frequency, while poorly edited samples (like GEAM-015) retain a high reference allele frequency.
CRISPResso2 produces several output files. The CSV format used here represents a consolidated summary — the kind you would get by combining data from multiple CRISPResso_quantification_of_editing_frequency.txt files into a single spreadsheet. If your pipeline outputs a different format, adjust the column names in the prompt to match.
If something is off
LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:
| Problem | Follow-up prompt |
|---|---|
| Charts render but bars are not stacked | The stacked bar chart is showing separate bars instead of stacking NHEJ, HDR, Mixed, and Unmodified on top of each other. Set stacked: true on both the x and y axes in the Chart.js options. |
| Percentage values display as decimals | The modification percentages are showing as 0.87 instead of 87.0. The CSV already contains percentage values, not fractions -- don't multiply by 100 again. |
| Flagged samples are not highlighted in the table | The data table rows are all the same color. Apply the color-coded left borders based on modification rate: green for >50%, yellow for 10-50%, red for <10%. Also highlight individual flagged cells with a red background. |
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works (the 2-minute explanation)
You do not need to read every line of the generated code, but here is the mental model:
- CSV parsing splits each line by commas (respecting quoted fields), uses the first row as column headers, and converts each subsequent row into a JavaScript object. Numeric columns are parsed as floats so math operations work correctly.
- Analysis engine iterates through every sample and applies the flagging rules: checking read counts against the 5,000 threshold, comparing computed vs. reported modification rates, looking for blank fields, and identifying suspicious HDR values. Each flag is stored as an object with the sample ID, the field that triggered it, and a human-readable explanation.
- Chart rendering creates two Chart.js instances. The stacked bar chart normalizes NHEJ + HDR + Mixed + Unmodified to 100% per sample. The allele frequency chart plots two bars per sample side by side. Both charts pull data from the same parsed array, so any filtering or sorting in the table is reflected.
- Export converts each Chart.js canvas to a PNG data URL using
canvas.toDataURL(), then writes a new HTML document with those images, the data table, and the flag list. The result is a static, print-friendly page that can be saved as a PDF for lab notebooks.
The allele frequency chart reveals something the modification percentage alone does not: whether editing produced a single dominant allele or a scattered population of different indels. A sample with 90% modification but a top allele frequency of only 30% has many different indel variants — typical of NHEJ. A sample with 90% modification and a top allele frequency of 60% has a dominant editing outcome — potentially useful if you are trying to establish a specific allele in a mouse line. When planning colony expansion (the kind you designed in L9), you want high modification and a dominant allele, because that allele is what you will be genotyping for in subsequent generations.
Customize it
The base analyzer is useful as-is, but every editing experiment has unique needs. Each of these is a single follow-up prompt:
Add allele classification tiers
Add a classification column to the data table that categorizes each sample:- "Strong KO" if Percent_Modified > 80% and Percent_NHEJ > 70%- "Moderate KO" if Percent_Modified is 30-80%- "Weak/Failed" if Percent_Modified < 30%- "High HDR" if Percent_HDR > 15% (flag for donor template investigation)Show the classification as a colored badge in the table and add a pie chartshowing the distribution of classifications across all samples.Add batch comparison view
Add a "Compare Batches" feature. Let the user upload two CSV files (e.g.,replicate 1 and replicate 2). Show a side-by-side bar chart comparingPercent_Modified for each Guide_Name across the two batches. Highlightguides where the modification rate differs by more than 15 percentagepoints between replicates -- that level of variation suggests a technicalissue rather than biological variability.Add guide ranking summary
Add a "Guide Ranking" section that groups all samples by Guide_Name andcomputes the average modification rate, average HDR rate, and consistency(standard deviation) across replicates for each guide. Rank guides frommost to least effective in a summary table. This helps decide which guideto carry forward into animal model production.Add downloadable CSV report
Add a "Download CSV Summary" button that exports a cleaned-up CSV withcolumns: Sample_ID, Guide_Name, Target_Gene, Percent_Modified, Percent_NHEJ,Percent_HDR, Classification, Flags. This is the format our facility usesto report results back to investigators.Start with the working analyzer, then add features one prompt at a time. Each prompt builds on what exists. The guide ranking summary is especially valuable — it turns raw sequencing data into a recommendation for which guide to use going forward.
Try it yourself
- Open your CLI tool in an empty folder.
- Paste the main prompt from above.
- Open the generated
editing-analyzer.htmlin your browser. - Click Load Example to see the analysis on the embedded test data.
- Check the flag panel — confirm it catches the low read counts (GEAM-004, GEAM-007), the ineffective guides (GEAM-008, GEAM-009, GEAM-015), and the suspicious HDR values (GEAM-010, GEAM-014).
- If you have real CRISPResso2 output, consolidate your results into a CSV matching the column format and drop it on the analyzer.
- Pick one customization from the list above and add it with a follow-up prompt.
If you work in a genome editing facility, bookmark this HTML file on the analysis workstation. The next time an investigator asks “did my edit work?”, you can hand them a visual report in under a minute instead of building one manually in Excel.
Key takeaways
- One prompt, one tool: a detailed prompt with embedded sample data produces a working CRISPR editing analyzer in under 2 minutes.
- Automated flagging catches what manual review misses: low read counts, missing data, and suspiciously high HDR rates are easy to overlook in a spreadsheet but impossible to miss when the tool highlights them in red.
- The allele frequency chart adds insight beyond modification percentage: it reveals whether editing produced a dominant allele (useful for colony establishment) or scattered indels (typical NHEJ).
- Embedding test data with known issues in the prompt guarantees you can verify the tool works immediately, without needing a separate test file.
- This closes the CRISPR workflow loop: design guides (L2), plan the colony (L9), assess editing outcomes (this lesson). Each step is a single-prompt tool.
GEAM-007 has only 380 total reads while most other samples have 44,000-53,000. Why does the analyzer flag this sample rather than just reporting its modification percentage?
GEAM-010 shows 34.1% HDR and GEAM-014 shows 42.5% HDR. The analyzer flags both as suspicious. Why are high HDR rates a concern?
What’s next
In the next lesson, you will build a Proteomics Search Results Triage tool that parses mass spectrometry search engine output and highlights high-confidence protein identifications — moving from genomic editing analysis to proteomic data interpretation.