Build a Sequence Analysis Dashboard

What you’re building

Imagine pasting a raw FASTA sequence into a browser window and instantly seeing GC content, base composition charts, every open reading frame, and amino acid statistics — no server, no installation, no BioPython dependency hell. Just one HTML file you can open on any lab computer.

That is what you will build in the next 15 minutes.

💬This is your proof-of-concept

This module isn’t just practice — it’s portfolio material. By the end, you’ll have built a working bioinformatics tool that demonstrates exactly what “AI-augmented scientist” looks like. This is the artifact you show during your next review, your next grant proposal, or your next interview. It proves you can combine domain expertise with modern tooling to produce real results.

By the end of this lesson you will have a standalone sequence analysis dashboard that runs entirely in the browser. It handles multi-sequence FASTA input, renders interactive charts with Chart.js, and finds ORFs across all six reading frames. You will build it by giving a single, carefully-crafted prompt to an LLM CLI tool.

ℹSoftware pattern: Interactive data dashboard

Upload → parse → visualize → filter. This pattern works for survey data, financial reports, any dataset. The techniques in this lesson transfer directly to non-biology contexts.

🔍Domain Primer: Key biology terms you'll see in this module

New to bioinformatics? Here are the key terms you’ll encounter:

FASTA — A text file format for DNA/protein sequences. Each entry has a header line starting with >, followed by one or more sequence lines until the next header. Think of it like a labeled container for genetic data.
GC Content — The percentage of a DNA sequence made up of guanine (G) and cytosine (C) bases. Higher GC content can affect how stable a DNA strand is. It’s like measuring the “strength” of a sequence.
ORF (Open Reading Frame) — A stretch of DNA that could potentially code for a protein. It starts with a “start codon” (ATG) and ends with a “stop codon.” Finding ORFs is like finding sentences in a long string of letters.
BLAST — A tool that searches databases to find sequences similar to yours. Like Google, but for DNA and protein sequences.
Sequence alignment — Comparing two or more sequences to find similarities. Like lining up two sentences to see which words match.

While our dashboard will not perform BLAST searches or complex alignments, it serves as the perfect first step before using those heavier tools.

You don’t need to be an expert in these concepts — the AI tools will handle the technical details. You just need to know what you’re asking for.

Who this is for

Undergrads rotating through a core facility who need a quick sanity check on a sequence before submitting a job.
Grad students who want a lightweight QC tool they can customize for their organism of interest.
PIs and staff scientists who want to show trainees how fast computational tools can be prototyped.

ℹCore Facility Context

University core facilities offer full-service RNA-seq analysis, genome assembly, genotype-by-sequencing, and AlphaFold protein structure prediction — but core facility queues can mean days or weeks of turnaround. A self-built sequence dashboard lets you do quick QC the moment you get your FASTA files back, while the core handles deeper analysis in parallel. If you are taking a bioinformatics course, this dashboard covers the same BLAST and genome assembly sanity checks you do in class — but packaged as a reusable tool you can hand to anyone in your lab.

The showcase

Here is what the finished dashboard looks like once you open the HTML file in a browser:

Header with a textarea where you paste one or more FASTA sequences.
Summary panel showing sequence count, total length, and overall GC%.
Per-sequence cards, each displaying:
- Sequence ID and description parsed from the header line.
- Length, GC content, and AT/GC ratio.
- A base composition bar chart (A, T, G, C, N counts) rendered with Chart.js.
- A table of open reading frames (ORFs) found across all six reading frames, with start/stop positions, length, and the first 30 amino acids of each.
- Amino acid frequency chart for the longest ORF.

Everything runs client-side after the page loads Chart.js and the Inter font from their CDNs. Your sequence data is never uploaded anywhere. You can email the file to a collaborator and it works on their machine immediately.

The prompt

Open your terminal, navigate to a project folder, start your AI CLI tool (e.g., by typing claude), and paste this prompt:

Build a single self-contained HTML file called sequence-dashboard.html that serves as
a DNA sequence analysis dashboard. Requirements:

1. FASTA PARSER
   - A large textarea for pasting one or more FASTA-formatted sequences
   - Parse multi-sequence FASTA: lines starting with ">" are headers, subsequent lines
     are sequence data
   - Strip whitespace and newlines from sequence data, ignore blank lines
   - Handle both uppercase and lowercase bases

2. PER-SEQUENCE ANALYSIS
   For each parsed sequence display a card showing:
   - Sequence ID (first word after ">") and full description
   - Total length in bases
   - GC content as a percentage (G+C)/(A+T+G+C) with 2 decimal places
   - AT/GC ratio
   - Base composition bar chart using Chart.js (load from CDN) showing counts of
     A, T, G, C, and N/other
   - A table of open reading frames (ORFs): scan all 6 reading frames (3 forward,
     3 reverse complement), find every ATG...stop(TAA/TAG/TGA) region >= 100 codons,
     report frame, strand, start position, stop position, length in codons, and first
     30 amino acids (use standard codon table)
   - Amino acid frequency horizontal bar chart for the longest ORF in that sequence

3. SUMMARY PANEL (top of page, updates live)
   - Number of sequences parsed
   - Total bases across all sequences
   - Overall GC% across all sequences

4. DESIGN
   - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8
   - Clean sans-serif font (Inter from Google Fonts CDN)
   - Responsive layout, cards in a single column
   - Include a "Load Example" button that populates the textarea with 2 sample
     sequences (use realistic E. coli sequences ~500bp each with known ORFs)
   - Include a "Clear" button

5. TECHNICAL
   - Pure HTML/CSS/JS in one file, no build step
   - Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js)
   - Codon translation table hardcoded in JS
   - Reverse complement function for the 3 reverse frames

💡Copy-paste ready

That entire block is the prompt. Paste it as-is. The specificity is deliberate — the more precise you are about requirements, the closer the first output will be to what you actually want. Vague prompts produce vague tools.

What you get

After the LLM finishes (typically 60-90 seconds), you will have a single file: sequence-dashboard.html. Open it in any browser.

Expected output structure

sequence-dashboard.html    (~600-800 lines)

Click Load Example and you should see:

Two sequence cards appear, each with a header like >ecoli_fragment_1 E. coli K-12 region.
GC content around 50-52% (typical for E. coli).
Base composition charts with roughly equal A/T and G/C bars.
At least one ORF per sequence in the ORF table (the example sequences are chosen to contain them).
An amino acid frequency chart showing the distribution for the longest ORF.

If something is off

LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:

Problem	Follow-up prompt
Chart.js not rendering	`The charts aren't showing. Make sure Chart.js is loaded before any chart code runs. Add an onload handler.`
ORFs missing on reverse strand	`The reverse complement ORF search isn't working. Double-check the reverse complement function and make sure you're scanning frames 0, 1, 2 on the reverse complement string.`
Amino acid table empty	`The codon table lookup is returning undefined for some codons. Make sure all 64 codons are in the table and the sequence is being read in triplets correctly.`

Worked example: Analyzing a GenBank sequence

Here is a real-world scenario. You download a sequence from NCBI for the E. coli lacZ gene and want to quickly check its properties before cloning.

Step 1. Go to NCBI GenBank and search for accession V00296 (the E. coli lacZ gene). Click “Send to” > “File” > FASTA format. Save it.

Step 2. Open your dashboard in a browser. Paste the contents of the FASTA file into the textarea.

Step 3. Examine the output:

Length: ~6,600 bp for the full GenBank record. The lacZ coding sequence within it spans ~3,075 bp (codons 1–1024). GenBank entries often include flanking sequence beyond the gene of interest.
GC content: approximately 52% — consistent with E. coli coding sequences.
ORFs: you should see one large ORF spanning nearly the entire sequence (1,024 codons, encoding beta-galactosidase).
Amino acid frequency: the composition should look biologically plausible and non-uniform, with leucine and alanine among the more frequent residues.

If the ORF table is empty, the threshold (100 codons) might be filtering out partial ORFs. Ask the AI:

The ORF table is empty for my sequence. Lower the minimum ORF size from 100 codons
to 30 codons so I can see smaller ORFs too. Add a slider to let me adjust this
threshold interactively.

ℹWorking with sequences from your own organism

If you work with a GC-rich organism like Streptomyces (~72% GC) or an AT-rich organism like Plasmodium (~20% GC), the dashboard will immediately reveal this. The base composition charts and GC% are your first sanity check that you have the right sequence and that it was not corrupted during download or cloning.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

Pasting a FASTA file with multiple sequences only shows the first one

Evidence

The textarea has 5 sequences (5 lines starting with >) but only 1 card appears below

What to ask the AI

"My FASTA parser is only reading the first sequence. It looks like the split on '>' is not working for multi-sequence input. Can you fix the parser to handle multiple sequences? The raw text has 5 header lines starting with '>'."

Symptom

Chart.js charts are blank white rectangles

Evidence

The page loads and cards appear with text, but all the bar charts are empty white boxes. No errors in the console except: 'Chart is not defined'

What to ask the AI

"Chart.js is not loading before my code tries to create charts. Can you wrap the chart creation in a window.onload handler, and also add an onerror fallback on the Chart.js CDN script tag so I can see if the CDN is blocked?"

Symptom

Sequence with lowercase bases shows 0% GC content

Evidence

I pasted a sequence from SnapGene that uses lowercase letters for introns. The length shows correctly (4,200 bp) but GC% says 0.00%

What to ask the AI

"The GC calculation is not handling lowercase bases. Can you add .toUpperCase() to the sequence string before counting bases? The raw sequence has a mix of upper and lowercase ATGC."

Symptom

Multi-line FASTA sequences are truncated

Evidence

My FASTA file wraps sequences at 70 characters per line. The dashboard shows each sequence as only 70 bp long instead of the full length.

What to ask the AI

"The FASTA parser is treating each line as a separate sequence instead of concatenating continuation lines. Can you fix it so that all lines between two '>' headers are joined into one sequence?"

Symptom

Non-standard IUPAC characters cause NaN in calculations

Evidence

My sequence contains R, Y, S, W characters (IUPAC ambiguity codes from a consensus sequence). GC% shows NaN and the base chart has an error.

What to ask the AI

"My sequence has IUPAC ambiguity codes (R, Y, S, W, M, K, etc.) that are not A, T, G, or C. Can you count these as 'Other/N' in the base composition and exclude them from the GC calculation denominator?"

How it works (the 2-minute explanation)

You do not need to understand every line of the generated code, but here is the mental model:

FASTA parsing splits on > characters, then separates the first line (header) from the remaining lines (sequence). This is a universal bioinformatics pattern.
GC content is simply (countG + countC) / (countA + countT + countG + countC) — ambiguous bases (N, R, Y, etc.) are excluded from the denominator. It is the single most common sequence statistic.
ORF finding translates each of the 6 reading frames (3 forward, 3 on the reverse complement), looks for ATG start codons, and scans until a stop codon (TAA, TAG, TGA) or end of sequence. Any stretch of 100+ codons gets reported.
Chart.js is a widely-used charting library. Loading it from a CDN means no installation. The generated code creates new Chart(canvas, config) for each chart.

🔍For Researchers: Why single-file tools matter

In a core facility environment, you cannot always install software. IT policies, shared workstations, and HPC nodes with restricted environments all create friction. A single HTML file that runs in any browser sidesteps all of that. You can attach it to a lab notebook entry, email it to a collaborator, or host it on an internal web server. This pattern — self-contained browser tools — is underused in bioinformatics, and LLMs make it trivial to create them.

Customize it

The base dashboard is useful as-is, but the real power is in customization. Each of these is a single follow-up prompt:

Add restriction enzyme sites

Add a restriction enzyme analysis panel to each sequence card. Include these common
enzymes: EcoRI (GAATTC), BamHI (GGATCC), HindIII (AAGCTT), NotI (GCGGCCGC),
XhoI (CTCGAG), NdeI (CATATG). Show cut positions and a simple linear map with
colored markers for each enzyme site.

Add codon usage table

Add a codon usage frequency table for the longest ORF in each sequence. Display it
as a grid grouped by amino acid, with each cell showing the codon, count, and
frequency as a fraction of synonymous codons. Highlight rare codons (frequency
< 10% among synonymous codons) in red. This helps identify expression optimization
targets.

Add Tm calculator for primers

Add a section at the bottom where I can input a short primer sequence (18-30 nt)
and get the melting temperature calculated using the nearest-neighbor method with
default salt conditions (50 mM Na+, 0 nM oligo). Show Tm for both basic (4+2 rule)
and nearest-neighbor methods side by side.

Export results

Add an "Export Report" button that generates a downloadable PDF-style report by
opening a new window with a print-friendly layout. Include all charts rendered as
static images (use Chart.js toBase64Image), tables, and summary stats. Style it
for A4 paper.

ℹThe customization loop

Notice the pattern: you start with a working tool, then add features one prompt at a time. Each prompt builds on what already exists. This is how all the tools in this track are built — iteratively, starting from a solid foundation. You never need to plan the entire tool upfront.

Real-World Extension: Forensic DNA Analysis

Forensic DNA identification projects — such as those working with the Defense POW/MIA Accounting Agency (DPAA) to identify missing military personnel from WWII and Korea — involve heavily degraded DNA extracted from skeletal remains recovered during field excavations.

A sequence analysis dashboard adapted for forensic DNA work looks different from the standard version. A forensic team needs:

Lower ORF thresholds — degraded DNA from decades-old remains yields short, fragmented sequences. A minimum ORF of 100 codons misses most fragments. Setting the threshold to 10-20 codons captures the small coding regions that survive degradation.
mtDNA haplogroup indicators — mitochondrial DNA is the primary identification tool for ancient remains because it survives degradation better than nuclear DNA. Displaying the mtDNA haplogroup based on diagnostic SNPs helps the team quickly classify a sample.
Reference comparison mode — forensic teams compare recovered mtDNA sequences against reference samples donated by living family members. A side-by-side view highlighting mismatches accelerates the matching process.

⚠Educational prototype only

The haplogroup prediction from a small set of diagnostic SNPs is a simplified demonstration. Do not use this tool for forensic decision-making. Real forensic identification requires validated pipelines, full mtDNA sequencing, and STR analysis through accredited laboratories.

Here is a follow-up prompt to adapt your dashboard for this kind of work:

Modify the sequence dashboard for forensic DNA analysis:
1. Lower the minimum ORF size to 10 codons (with a slider from 5 to 100)
2. Add an mtDNA mode: when enabled, check the sequence against a table of
   common mtDNA haplogroup-defining SNP positions (H, U, K, J, T) and display
   the predicted haplogroup
3. Add a "Compare" tab where I can paste a reference sequence and an evidence
   sequence side by side. Highlight mismatches in red, matches in green, and
   show a similarity percentage. This is for comparing recovered remains DNA
   against family reference samples.

🔍About forensic DNA identification

Forensic DNA identification projects are interdisciplinary — historians locate burial sites, archaeologists excavate, forensic anthropologists analyze remains, and biologists extract DNA. The tools you are building in this module fit directly into the biology side of that pipeline. The key takeaway: any sequence analysis dashboard can be adapted for specialized domains by adjusting thresholds, adding domain-specific features, and customizing the output.

Try it yourself

Open your CLI tool in an empty folder.
Paste the main prompt from above.
Open the generated sequence-dashboard.html in your browser.
Paste a real sequence from your research (or grab one from NCBI GenBank).
Pick one customization from the list above and add it.

If the tool does something useful for your specific research, save it. Put it in a GitHub repo. Share the link with your lab. You just built a bioinformatics tool in 15 minutes, and you can keep extending it indefinitely.

Key takeaways

One prompt, one tool: a detailed, specific prompt produces a working sequence analysis dashboard in under 2 minutes.
Single-file HTML tools bypass all installation barriers — they run on any computer with a browser, which makes them ideal for shared lab workstations and core facilities.
The FASTA format has edge cases (multi-line wrapping, mixed case, IUPAC codes, blank lines) — building those into your prompt prevents debugging later.
Iterative customization is the pattern: get a working base, then add features one prompt at a time. Never try to specify everything in the first prompt.
GC content and ORF analysis are sanity checks you should run on every sequence before cloning, submission, or downstream analysis.

Portfolio suggestion

Save your finished sequence-dashboard.html along with the prompts you used to build and customize it. If you added restriction enzyme mapping or codon usage, those make excellent additions to a lab meeting presentation. Consider creating a short document (3-4 paragraphs) describing what you built, what problem it solves for your lab, and one thing you would add next. This demonstrates both your technical capability and your scientific judgment about what tools are needed.

🔍Advanced: Batch processing multiple FASTA files

If you routinely analyze dozens of sequences (e.g., from a cloning project or a mutagenesis screen), you can extend the dashboard to accept file uploads instead of paste:

Add a file upload button that accepts .fasta and .fa files. When a file is uploaded,
read its contents and populate the textarea automatically. Also add a drag-and-drop
zone so I can drag FASTA files directly from my file manager. Support uploading
multiple files at once -- concatenate them with a separator comment line.

For truly large-scale analysis (hundreds of sequences), you are better off with a Python CLI tool like the one in Lesson 3. But for 5-50 sequences, the browser dashboard is perfectly adequate and much more convenient than setting up a Python environment.

KNOWLEDGE CHECK

You paste a FASTA file into the dashboard and GC content shows 0.00% even though the sequence length is correct. What is the most likely cause?

What’s next

In the next lesson, you will build something more ambitious: a CRISPR guide RNA design tool that finds PAM sites, scores guides, and displays results in a sortable, interactive table. Same pattern — showcase, prompt, output, customize — but with React and Vite instead of a single HTML file.

Build a Sequence Analysis Dashboard

What you'll learn

What you’re building

Who this is for

The showcase

The prompt

What you get

Expected output structure

If something is off

Worked example: Analyzing a GenBank sequence

When Things Go Wrong

How it works (the 2-minute explanation)

Customize it

Add restriction enzyme sites

Add codon usage table

Add Tm calculator for primers

Export results

Real-World Extension: Forensic DNA Analysis

Try it yourself

Key takeaways

Portfolio suggestion

What’s next