Applied Module 12 · AI-Powered Bioinformatics Tools

CRISPR Guide RNA Design Tool

What you'll learn

~25 min
  • Build a React + Vite CRISPR guide RNA design tool using a single AI prompt
  • Understand how PAM site detection and guide scoring work algorithmically
  • Troubleshoot common issues with strand orientation, scoring normalization, and PAM detection
  • Extend the tool with additional Cas proteins, off-target checking, and HDR template design

What you’re building

CRISPR-Cas9 experiments live or die by guide RNA quality. You need to find every NGG PAM site in your target region, score the candidate guides, and pick the best ones — ideally without paying for a commercial tool or waiting for a web server queue.

In this lesson you will build an interactive CRISPR guide RNA design tool as a React + Vite application. Paste a target gene sequence, and the tool finds all NGG PAM sites on both strands, scores each 20-nt guide by GC content, homopolymer penalties, and position weight, then displays everything in a sortable table with a sequence viewer that highlights selected guides in context.

This is a step up from Lesson 1. Instead of a single HTML file, you are building a proper React application with components, state management, and a development server. The LLM handles all of that scaffolding — you just describe what you want.

Software pattern: Multi-criteria filtering engine

Define rules → score → rank. This pattern works for resume screening, vendor selection, compliance checking — anywhere you need to evaluate candidates against multiple criteria.


The showcase

When finished, your app will have:

  • Input panel: a textarea for pasting a DNA sequence (raw or FASTA) and a dropdown to select PAM type (NGG default, with NAG and NNGRRT options).
  • Results table: every candidate guide RNA listed with columns for Position, Strand, Guide Sequence (20-nt), PAM, GC%, Homopolymer Score, Composite Score, and a Select checkbox.
  • Sorting and filtering: click any column header to sort. Filter by minimum GC%, maximum homopolymer run, or minimum composite score.
  • Sequence viewer: the full input sequence displayed in a monospace font with color-coded highlighting — selected guides in green, PAM sites in yellow, overlapping guides in orange.
  • Export: download selected guides as a CSV file for ordering oligos.

By default, analysis runs locally in-browser. No sequence upload is required.

Core Facility Context

Many university core facilities provide CRISPR/Cas9 services for mice, rats, and cell lines — including guide RNA design, microinjection, electroporation, and genome-wide screening. Building your own guide RNA design tool complements these services: you can pre-screen candidate guides before submitting a project to the core, saving turnaround time and reducing costs. When you have already identified your top 3 guides with scores and rationale, the conversation with core facility staff starts at a much higher level.

If you are taking a university-level genetics or genomics course, this tool lets you practice CRISPR screen design and guide selection interactively on your own sequences.


The prompt

Navigate to an empty folder, open your AI CLI tool (such as Claude Code, Gemini CLI, or your preferred tool), and paste this prompt:

Create a React + Vite application for CRISPR guide RNA design. Use TypeScript and
Tailwind CSS. The app should be called crispr-guide-designer.
CORE FUNCTIONALITY:
1. INPUT
- Large textarea accepting raw DNA sequence or FASTA format
- Auto-strip FASTA headers, whitespace, numbers, non-ATGC characters
- Display cleaned sequence length
- PAM selector dropdown: SpCas9 NGG (default), SpCas9 NAG, SaCas9 NNGRRT
- "Analyze" button and a "Load Example" button (use ~800bp of human AAVS1
safe harbor locus as the example)
2. PAM FINDING AND GUIDE SCORING
- Scan both strands for the selected PAM motif
- For each PAM found, extract the 20-nt guide sequence upstream of the PAM
- Score each guide on three criteria:
a) GC content (ideal 40-70%, penalty outside this range)
b) Homopolymer runs (penalize runs of 4+ identical bases)
c) Terminal GC (bonus for G or C in last 4 bases of guide, which aids
Cas9 binding)
- Composite score: weighted combination, scaled 0-100
- Discard guides with GC < 20% or GC > 80%
3. RESULTS TABLE (main component)
- Columns: #, Position (1-based), Strand (+/-), Guide (20nt), PAM (variable length),
GC%, Homopolymer (longest run), Terminal GC (count of GC in last 4 nt),
Score (composite), Select (checkbox)
- All columns sortable by clicking the header (toggle asc/desc)
- Filter controls above the table:
- Min GC% slider (20-80)
- Max homopolymer run dropdown (3, 4, 5, any)
- Min composite score slider (0-100)
- Color-code rows: green for score >= 70, yellow for 50-69, red for < 50
- Show total guides found and how many pass current filters
4. SEQUENCE VIEWER (below results)
- Display the full input sequence in a monospace font, 80 characters per line,
with position numbers every line
- Highlight PAM sites in yellow background
- When guides are selected (checkbox), highlight the 20-nt guide region in
green with the PAM in a brighter yellow
- If two selected guides overlap, show the overlap region in orange
- Clicking a row in the results table scrolls the sequence viewer to that
position
5. EXPORT
- "Download CSV" button exports selected guides with all columns
- "Copy Oligos" button copies selected guide sequences to clipboard in a
format ready for ordering (adds 5'-CACCG prefix and 3' for BbsI cloning
into pX459, plus the complementary oligo with 5'-AAAC prefix)
6. DESIGN
- Dark theme matching: bg-slate-950, cards bg-slate-900, text-slate-200,
accent sky-400
- Clean layout: input on top, results table middle, sequence viewer bottom
- Responsive but optimized for desktop (this is a lab workstation tool)
Generate the complete application with all components. Use Vite with React and
TypeScript. Include a README with setup instructions (npm install && npm run dev).
💡Why React + Vite instead of a single HTML file?

The guide RNA tool has interactive state: sorting, filtering, selection, and synchronized highlighting between the table and the viewer. React’s component model makes this manageable. Vite gives you hot reload during development, so you can tweak the scoring algorithm and see results instantly. The LLM sets all of this up for you — you do not need to know how Vite configuration works.


What you get

The LLM will generate a project structure like this:

crispr-guide-designer/
├── index.html
├── package.json
├── tsconfig.json
├── vite.config.ts
├── tailwind.config.js
├── postcss.config.js
├── src/
│ ├── main.tsx
│ ├── App.tsx
│ ├── index.css
│ ├── types.ts
│ ├── utils/
│ │ ├── pamFinder.ts
│ │ ├── guideScorer.ts
│ │ ├── fastaParser.ts
│ │ └── codonUtils.ts
│ └── components/
│ ├── SequenceInput.tsx
│ ├── ResultsTable.tsx
│ ├── SequenceViewer.tsx
│ ├── FilterControls.tsx
│ └── ExportButtons.tsx

To run it:

Terminal window
cd crispr-guide-designer
npm install
npm run dev

Open http://localhost:5173 in your browser. Click Load Example to see the AAVS1 locus analyzed.

💡Prerequisite: Node.js

You need Node.js installed on your computer to run npm install and npm run dev. If you get a npm: command not found error, install Node.js from nodejs.org (or use the nvm approach from Module 6).

Expected behavior with the example sequence

  • The AAVS1 safe harbor locus (~800 bp) should yield approximately 30-50 NGG PAM sites across both strands.
  • Top-scoring guides will have GC content between 45-65%, no homopolymer runs of 4+, and strong terminal GC.
  • The sequence viewer should show a dense pattern of yellow PAM highlights throughout the sequence.

Common issues and fixes

ProblemFollow-up prompt
Guides showing on wrong strandThe guide extraction is off. For an NGG PAM on the + strand, the guide is the 20 nt immediately 5' of the PAM. For the - strand, find CCN on the + strand (the forward-strand proxy for a reverse-strand NGG), then the guide is the reverse complement of the 20 nt 3' of CCN.
Score always 0 or 100The composite score normalization is wrong. Recalculate: GC score = 100 if 40-70%, linear penalty to 0 at 20% and 80%. Homopolymer score = 100 if max run <= 3, -20 per additional base. Terminal GC = 25 * count of GC in last 4 nt. Composite = 0.4 * GC + 0.3 * homopolymer + 0.3 * terminal.
Tailwind classes not applyingTailwind isn't working. Make sure postcss.config.js exists with tailwindcss and autoprefixer plugins, and tailwind.config.js has content: ["./src/**/*.{ts,tsx}"].

Worked example: Designing guides for a knockout experiment

Here is a scenario you might encounter in a real project. Your PI asks you to design CRISPR guides to knock out the BRCA1 gene in a human cell line.

Step 1. Go to NCBI Gene (gene ID: 672) and find the coding sequence for BRCA1. You do not need the entire 81 kb genomic region — focus on exon 2, which contains the start codon. Download ~500 bp around the ATG start site in FASTA format.

Step 2. Paste the sequence into your tool and click Analyze. You should see 15-25 NGG PAM sites in this region.

Step 3. Filter for guides with composite score >= 70. Sort by score descending. The top 3-5 guides are your primary candidates.

Step 4. Select 2-3 top guides and look at the sequence viewer. Ideally, your guides should:

  • Target early in the coding sequence (near the ATG) for a clean knockout.
  • Not overlap with splice sites or regulatory elements.
  • Target different positions and have distinct off-target profiles, so you have independent backup options.

Step 5. Click “Copy Oligos” to get the BbsI-compatible oligo sequences ready for ordering.

A real lab tip

Always order at least 2 guides per gene. If one guide has low cutting efficiency in your cell line, the other serves as a backup. Many labs test 3 guides per target. This tool makes it trivial to pick the top 3 candidates — before this, you might have spent 30 minutes on Benchling or CRISPOR per gene.

Worked example: Checking for PAM sites in a GC-rich region

Some organisms (e.g., Mycobacterium tuberculosis at ~66% GC or Streptomyces coelicolor at ~72% GC) have very high GC content. In these cases:

  • Almost every guide will have high GC%, which paradoxically makes the GC filter less useful.
  • There will be more NGG PAM sites per kilobase (because GG dinucleotides are more frequent).
  • Homopolymer runs of G are more common, so the homopolymer penalty becomes the primary differentiator.

Try pasting a GC-rich sequence and observe how the scoring distribution shifts. Ask the AI to adjust:

My organism has ~70% GC content so all guides have high GC%. Adjust the scoring:
change the ideal GC range to 50-75% instead of 40-70%, and increase the
homopolymer penalty weight to 0.4. This better reflects guide design priorities
for GC-rich genomes.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom
No PAM sites found on the reverse strand
Evidence
The results table shows 20 guides all on the + strand, but none on the - strand. My sequence is 800 bp so there should be guides on both strands.
What to ask the AI
"The PAM finder is only scanning the forward strand. For the reverse strand, you need to search for CCN on the forward strand (the forward-strand proxy for a reverse-strand NGG PAM), then extract the guide from the reverse complement. Can you fix the pamFinder to scan both strands?"
Symptom
Guide sequences include the PAM itself
Evidence
Some guide sequences in the table are 23 nt long instead of 20 nt. It looks like the 3-nt PAM is being included in the guide.
What to ask the AI
"The guide extraction is including the PAM motif in the guide sequence. The guide should be exactly the 20 nucleotides upstream (5') of the PAM on the target strand. Can you fix the extraction to exclude the PAM from the guide?"
Symptom
SaCas9 NNGRRT PAM finds zero sites
Evidence
I selected SaCas9 NNGRRT from the dropdown and clicked Analyze, but the table says 0 guides found. With NGG I get 40 guides on the same sequence.
What to ask the AI
"The NNGRRT PAM pattern is not matching. NNGRRT is a degenerate motif where R means A or G. Make sure the PAM matching uses a regex or lookup that handles IUPAC ambiguity codes: N=[ATGC], R=[AG]. The regex for NNGRRT should be [ATGC]{2}G[AG][AG]T."
Symptom
Composite score shows NaN for some guides
Evidence
Most guides have numeric scores but 3 rows show NaN in the Score column. These guides have GC% of exactly 20% or 80%.
What to ask the AI
"The score normalization has a division by zero at the boundary values. When GC is exactly 20% or 80%, the linear penalty function divides by zero. Can you add a boundary check so GC of exactly 20% or 80% returns a score of 0 instead of NaN?"
Symptom
npm run dev fails with 'Cannot find module tailwindcss'
Evidence
Running npm run dev gives error: Error: Cannot find module 'tailwindcss'. I ran npm install and it completed without errors.
What to ask the AI
"Tailwind CSS is not in the dependencies. Can you add tailwindcss, postcss, and autoprefixer to devDependencies in package.json, create a postcss.config.js file, and make sure tailwind.config.js is set up correctly? Then I will run npm install again."

Understanding the scoring

The composite score is a simplified version of what tools like CRISPOR and Benchling use internally. Here is the breakdown:

GC content (40% weight): Guides with 40-70% GC bind stably without excessive secondary structure. Below 40%, binding is weak. Above 70%, the guide tends to form hairpins.

Homopolymer penalty (30% weight): Runs of 4+ identical bases (especially poly-T, which acts as a Pol III terminator) reduce guide efficacy. TTTT in a guide can prematurely terminate transcription from a U6 promoter.

Terminal GC (30% weight): G or C bases in positions 17-20 (the PAM-proximal end) improve Cas9 binding and cleavage. The seed region near the PAM is the most critical for target recognition.

🔍For Researchers: Limitations of this scoring model

This tool uses a heuristic scoring model suitable for rapid screening. It does not include:

  • Off-target analysis (requires genome-wide alignment with tools like Bowtie or Cas-OFFinder)
  • On-target efficiency models (Rule Set 2, DeepCRISPR, or CHOPCHOP scoring)
  • Chromatin accessibility data (which affects cleavage efficiency in vivo)

For publication-quality guide selection, use this tool for initial screening, then validate top candidates with CRISPOR (crispor.tefor.net) or Benchling’s CRISPR module. The value of this tool is speed and privacy — your unpublished target sequences never leave your machine.


Customize it

Add support for different Cas proteins

Add PAM options for Cas12a/Cpf1 (TTTV PAM, 5' of the guide instead of 3'),
CjCas9 (NNNNRYAC), and xCas9 (NG). The guide extraction logic needs to flip
for Cas12a since the PAM is upstream. Update the PAM selector dropdown and the
finder logic accordingly.

Add off-target stub with mismatch counting

Add a basic off-target analysis panel. When a guide is selected, let the user
paste a second sequence (e.g., a known pseudogene or paralog) and check for
matches allowing up to 3 mismatches. Highlight potential off-target sites in
the pasted sequence. This is not a genome-wide search but is useful for checking
known problematic loci.

Add oligo design for HDR templates

Add an HDR template designer. After selecting a guide, let the user specify a
desired edit (point mutation, small insertion, or tag insertion). Generate the
HDR template with 40-80 nt homology arms flanking the cut site (3 nt upstream
of the PAM). Show the template sequence with color-coded homology arms, the
edit, and a silent PAM mutation to prevent re-cutting.

Batch mode for multiple targets

Add a batch mode tab where the user can paste multiple FASTA sequences (each a
different target gene). Run the analysis on all sequences and display a summary
table showing the best guide for each target, with an "Expand" button to see
all guides for that target. Add a "Download All" button that exports a plate
map CSV for oligo ordering.

The build pattern

Notice what just happened:

  1. You described a bioinformatics tool in plain English.
  2. The LLM generated a complete React application with TypeScript, multiple components, and a scoring algorithm.
  3. You ran two commands (npm install and npm run dev) and had a working tool.
  4. You can extend it with follow-up prompts, each adding a feature.

This is the Showcase + Guided Build pattern. You saw what was possible, you got the exact prompt to reproduce it, and now you have a customizable foundation. Every tool in this module follows the same pattern.


Key takeaways

  • CRISPR guide design is a filtering problem: you start with all possible PAM sites and narrow down using GC%, homopolymer content, and terminal GC scoring. The AI builds the filtering and scoring machinery; you supply the biological criteria.
  • React + Vite is the right tool for interactive state: when your app needs sorting, filtering, selection, and synchronized highlighting, a component framework like React keeps the code manageable. You can ship without deep React expertise, but understanding component and state basics will help you debug and extend reliably.
  • Guide strand orientation is the most common bug: the relationship between the PAM position on the + strand and the guide sequence on the - strand confuses both humans and LLMs. If your guides look wrong, check strand orientation first.
  • This tool is for screening, not final validation: use it to quickly identify the top 3-5 candidates, then validate with CRISPOR or Cas-OFFinder for off-target analysis before ordering oligos.
  • Keeping sequences local matters: unlike web-based tools, nothing leaves your machine. This matters for unpublished targets and proprietary sequences.

Portfolio suggestion

Save the entire crispr-guide-designer/ project folder. To make it portfolio-ready:

  1. Add a screenshot of the tool analyzing a real (or realistic) target to the README.
  2. Document which PAM types are supported and how the scoring works.
  3. If you added a customization (Cas12a support, off-target checking), describe it in the README.

This is a tool you can demo in a lab meeting or include in a grant application’s “broader impacts” section. If you deploy it (even just on a local server), other lab members can use it immediately. Consider creating a short write-up: “I built a CRISPR guide design tool in 25 minutes using AI — here is how it works and what it can do.”

🔍Advanced: Integrating with genome browsers

If you want to see your guides in genomic context, you can extend the tool to generate tracks for genome browsers:

Add a "Generate UCSC BED Track" button. When clicked, export the selected guides as
a BED file with columns: chromosome (ask the user to enter it), start position,
end position, guide name (e.g., "guide_1_plus_score72"), score (composite score),
strand. Also generate a track line header with name="CRISPR Guides" and
color settings for the score ranges. The user can upload this BED file to the
UCSC Genome Browser or IGV to see guides in the context of gene annotations,
conservation tracks, and epigenetic marks.

This is useful for presentations and for checking whether guides overlap with known regulatory elements, SNPs, or repetitive regions.


KNOWLEDGE CHECK

You are designing CRISPR guides for a gene in Streptomyces coelicolor (72% GC genome). Most of your guides have GC content above 70% and are being penalized. What is the best approach?


Try it yourself

  1. Build the base tool using the prompt above.
  2. Load the example sequence and verify that guides are being found on both strands.
  3. Sort by composite score and examine the top 5 guides. Do the GC% and homopolymer values make sense?
  4. Select 3 guides and check the sequence viewer — are they highlighted correctly?
  5. Click “Copy Oligos” and verify the BbsI-compatible format.
  6. Pick one customization from the list above and add it with a follow-up prompt.

What’s next

In Lesson 3, you will move from browser tools to command-line pipelines: a Python CLI tool that processes FASTQ files, runs quality checks, and generates HTML reports. This is the kind of tool that lives on your lab server and runs automatically when new sequencing data arrives.