Build a Sequence Analysis Dashboard
What you'll learn
~20 min- Build a standalone sequence analysis dashboard with a single AI prompt
- Parse multi-sequence FASTA input and compute GC content and ORFs
- Troubleshoot common issues with Chart.js rendering and FASTA edge cases
- Customize the dashboard with restriction enzymes, codon usage, or primer Tm
What you’re building
Imagine pasting a raw FASTA sequence into a browser window and instantly seeing GC content, base composition charts, every open reading frame, and amino acid statistics — no server, no installation, no BioPython dependency hell. Just one HTML file you can open on any lab computer.
That is what you will build in the next 15 minutes.
This module isn’t just practice — it’s portfolio material. By the end, you’ll have built a working bioinformatics tool that demonstrates exactly what “AI-augmented scientist” looks like. This is the artifact you show during your next review, your next grant proposal, or your next interview. It proves you can combine domain expertise with modern tooling to produce real results.
By the end of this lesson you will have a standalone sequence analysis dashboard that runs entirely in the browser. It handles multi-sequence FASTA input, renders interactive charts with Chart.js, and finds ORFs across all six reading frames. You will build it by giving a single, carefully-crafted prompt to an LLM CLI tool.
Upload → parse → visualize → filter. This pattern works for survey data, financial reports, any dataset. The techniques in this lesson transfer directly to non-biology contexts.
🔍Domain Primer: Key biology terms you'll see in this module
New to bioinformatics? Here are the key terms you’ll encounter:
- FASTA — A text file format for DNA/protein sequences. Each entry has a header line starting with
>, followed by one or more sequence lines until the next header. Think of it like a labeled container for genetic data. - GC Content — The percentage of a DNA sequence made up of guanine (G) and cytosine (C) bases. Higher GC content can affect how stable a DNA strand is. It’s like measuring the “strength” of a sequence.
- ORF (Open Reading Frame) — A stretch of DNA that could potentially code for a protein. It starts with a “start codon” (ATG) and ends with a “stop codon.” Finding ORFs is like finding sentences in a long string of letters.
- BLAST — A tool that searches databases to find sequences similar to yours. Like Google, but for DNA and protein sequences.
- Sequence alignment — Comparing two or more sequences to find similarities. Like lining up two sentences to see which words match.
While our dashboard will not perform BLAST searches or complex alignments, it serves as the perfect first step before using those heavier tools.
You don’t need to be an expert in these concepts — the AI tools will handle the technical details. You just need to know what you’re asking for.
Who this is for
- Undergrads rotating through a core facility who need a quick sanity check on a sequence before submitting a job.
- Grad students who want a lightweight QC tool they can customize for their organism of interest.
- PIs and staff scientists who want to show trainees how fast computational tools can be prototyped.
University core facilities offer full-service RNA-seq analysis, genome assembly, genotype-by-sequencing, and AlphaFold protein structure prediction — but core facility queues can mean days or weeks of turnaround. A self-built sequence dashboard lets you do quick QC the moment you get your FASTA files back, while the core handles deeper analysis in parallel. If you are taking a bioinformatics course, this dashboard covers the same BLAST and genome assembly sanity checks you do in class — but packaged as a reusable tool you can hand to anyone in your lab.
The showcase
Here is what the finished dashboard looks like once you open the HTML file in a browser:
- Header with a textarea where you paste one or more FASTA sequences.
- Summary panel showing sequence count, total length, and overall GC%.
- Per-sequence cards, each displaying:
- Sequence ID and description parsed from the header line.
- Length, GC content, and AT/GC ratio.
- A base composition bar chart (A, T, G, C, N counts) rendered with Chart.js.
- A table of open reading frames (ORFs) found across all six reading frames, with start/stop positions, length, and the first 30 amino acids of each.
- Amino acid frequency chart for the longest ORF.
Everything runs client-side after the page loads Chart.js and the Inter font from their CDNs. Your sequence data is never uploaded anywhere. You can email the file to a collaborator and it works on their machine immediately.
The prompt
Open your terminal, navigate to a project folder, start your AI CLI tool (e.g., by typing claude), and paste this prompt:
Build a single self-contained HTML file called sequence-dashboard.html that serves asa DNA sequence analysis dashboard. Requirements:
1. FASTA PARSER - A large textarea for pasting one or more FASTA-formatted sequences - Parse multi-sequence FASTA: lines starting with ">" are headers, subsequent lines are sequence data - Strip whitespace and newlines from sequence data, ignore blank lines - Handle both uppercase and lowercase bases
2. PER-SEQUENCE ANALYSIS For each parsed sequence display a card showing: - Sequence ID (first word after ">") and full description - Total length in bases - GC content as a percentage (G+C)/(A+T+G+C) with 2 decimal places - AT/GC ratio - Base composition bar chart using Chart.js (load from CDN) showing counts of A, T, G, C, and N/other - A table of open reading frames (ORFs): scan all 6 reading frames (3 forward, 3 reverse complement), find every ATG...stop(TAA/TAG/TGA) region >= 100 codons, report frame, strand, start position, stop position, length in codons, and first 30 amino acids (use standard codon table) - Amino acid frequency horizontal bar chart for the longest ORF in that sequence
3. SUMMARY PANEL (top of page, updates live) - Number of sequences parsed - Total bases across all sequences - Overall GC% across all sequences
4. DESIGN - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8 - Clean sans-serif font (Inter from Google Fonts CDN) - Responsive layout, cards in a single column - Include a "Load Example" button that populates the textarea with 2 sample sequences (use realistic E. coli sequences ~500bp each with known ORFs) - Include a "Clear" button
5. TECHNICAL - Pure HTML/CSS/JS in one file, no build step - Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js) - Codon translation table hardcoded in JS - Reverse complement function for the 3 reverse framesThat entire block is the prompt. Paste it as-is. The specificity is deliberate — the more precise you are about requirements, the closer the first output will be to what you actually want. Vague prompts produce vague tools.
What you get
After the LLM finishes (typically 60-90 seconds), you will have a single file: sequence-dashboard.html. Open it in any browser.
Expected output structure
sequence-dashboard.html (~600-800 lines)Click Load Example and you should see:
- Two sequence cards appear, each with a header like
>ecoli_fragment_1 E. coli K-12 region. - GC content around 50-52% (typical for E. coli).
- Base composition charts with roughly equal A/T and G/C bars.
- At least one ORF per sequence in the ORF table (the example sequences are chosen to contain them).
- An amino acid frequency chart showing the distribution for the longest ORF.
If something is off
LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:
| Problem | Follow-up prompt |
|---|---|
| Chart.js not rendering | The charts aren't showing. Make sure Chart.js is loaded before any chart code runs. Add an onload handler. |
| ORFs missing on reverse strand | The reverse complement ORF search isn't working. Double-check the reverse complement function and make sure you're scanning frames 0, 1, 2 on the reverse complement string. |
| Amino acid table empty | The codon table lookup is returning undefined for some codons. Make sure all 64 codons are in the table and the sequence is being read in triplets correctly. |
Worked example: Analyzing a GenBank sequence
Here is a real-world scenario. You download a sequence from NCBI for the E. coli lacZ gene and want to quickly check its properties before cloning.
Step 1. Go to NCBI GenBank and search for accession V00296 (the E. coli lacZ gene). Click “Send to” > “File” > FASTA format. Save it.
Step 2. Open your dashboard in a browser. Paste the contents of the FASTA file into the textarea.
Step 3. Examine the output:
- Length: ~6,600 bp for the full GenBank record. The lacZ coding sequence within it spans ~3,075 bp (codons 1–1024). GenBank entries often include flanking sequence beyond the gene of interest.
- GC content: approximately 52% — consistent with E. coli coding sequences.
- ORFs: you should see one large ORF spanning nearly the entire sequence (1,024 codons, encoding beta-galactosidase).
- Amino acid frequency: the composition should look biologically plausible and non-uniform, with leucine and alanine among the more frequent residues.
If the ORF table is empty, the threshold (100 codons) might be filtering out partial ORFs. Ask the AI:
The ORF table is empty for my sequence. Lower the minimum ORF size from 100 codonsto 30 codons so I can see smaller ORFs too. Add a slider to let me adjust thisthreshold interactively.If you work with a GC-rich organism like Streptomyces (~72% GC) or an AT-rich organism like Plasmodium (~20% GC), the dashboard will immediately reveal this. The base composition charts and GC% are your first sanity check that you have the right sequence and that it was not corrupted during download or cloning.
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works (the 2-minute explanation)
You do not need to understand every line of the generated code, but here is the mental model:
- FASTA parsing splits on
>characters, then separates the first line (header) from the remaining lines (sequence). This is a universal bioinformatics pattern. - GC content is simply
(countG + countC) / (countA + countT + countG + countC)— ambiguous bases (N, R, Y, etc.) are excluded from the denominator. It is the single most common sequence statistic. - ORF finding translates each of the 6 reading frames (3 forward, 3 on the reverse complement), looks for ATG start codons, and scans until a stop codon (TAA, TAG, TGA) or end of sequence. Any stretch of 100+ codons gets reported.
- Chart.js is a widely-used charting library. Loading it from a CDN means no installation. The generated code creates
new Chart(canvas, config)for each chart.
In a core facility environment, you cannot always install software. IT policies, shared workstations, and HPC nodes with restricted environments all create friction. A single HTML file that runs in any browser sidesteps all of that. You can attach it to a lab notebook entry, email it to a collaborator, or host it on an internal web server. This pattern — self-contained browser tools — is underused in bioinformatics, and LLMs make it trivial to create them.
Customize it
The base dashboard is useful as-is, but the real power is in customization. Each of these is a single follow-up prompt:
Add restriction enzyme sites
Add a restriction enzyme analysis panel to each sequence card. Include these commonenzymes: EcoRI (GAATTC), BamHI (GGATCC), HindIII (AAGCTT), NotI (GCGGCCGC),XhoI (CTCGAG), NdeI (CATATG). Show cut positions and a simple linear map withcolored markers for each enzyme site.Add codon usage table
Add a codon usage frequency table for the longest ORF in each sequence. Display itas a grid grouped by amino acid, with each cell showing the codon, count, andfrequency as a fraction of synonymous codons. Highlight rare codons (frequency< 10% among synonymous codons) in red. This helps identify expression optimizationtargets.Add Tm calculator for primers
Add a section at the bottom where I can input a short primer sequence (18-30 nt)and get the melting temperature calculated using the nearest-neighbor method withdefault salt conditions (50 mM Na+, 0 nM oligo). Show Tm for both basic (4+2 rule)and nearest-neighbor methods side by side.Export results
Add an "Export Report" button that generates a downloadable PDF-style report byopening a new window with a print-friendly layout. Include all charts rendered asstatic images (use Chart.js toBase64Image), tables, and summary stats. Style itfor A4 paper.Notice the pattern: you start with a working tool, then add features one prompt at a time. Each prompt builds on what already exists. This is how all the tools in this track are built — iteratively, starting from a solid foundation. You never need to plan the entire tool upfront.
Real-World Extension: Forensic DNA Analysis
Forensic DNA identification projects — such as those working with the Defense POW/MIA Accounting Agency (DPAA) to identify missing military personnel from WWII and Korea — involve heavily degraded DNA extracted from skeletal remains recovered during field excavations.
A sequence analysis dashboard adapted for forensic DNA work looks different from the standard version. A forensic team needs:
- Lower ORF thresholds — degraded DNA from decades-old remains yields short, fragmented sequences. A minimum ORF of 100 codons misses most fragments. Setting the threshold to 10-20 codons captures the small coding regions that survive degradation.
- mtDNA haplogroup indicators — mitochondrial DNA is the primary identification tool for ancient remains because it survives degradation better than nuclear DNA. Displaying the mtDNA haplogroup based on diagnostic SNPs helps the team quickly classify a sample.
- Reference comparison mode — forensic teams compare recovered mtDNA sequences against reference samples donated by living family members. A side-by-side view highlighting mismatches accelerates the matching process.
The haplogroup prediction from a small set of diagnostic SNPs is a simplified demonstration. Do not use this tool for forensic decision-making. Real forensic identification requires validated pipelines, full mtDNA sequencing, and STR analysis through accredited laboratories.
Here is a follow-up prompt to adapt your dashboard for this kind of work:
Modify the sequence dashboard for forensic DNA analysis:1. Lower the minimum ORF size to 10 codons (with a slider from 5 to 100)2. Add an mtDNA mode: when enabled, check the sequence against a table of common mtDNA haplogroup-defining SNP positions (H, U, K, J, T) and display the predicted haplogroup3. Add a "Compare" tab where I can paste a reference sequence and an evidence sequence side by side. Highlight mismatches in red, matches in green, and show a similarity percentage. This is for comparing recovered remains DNA against family reference samples.Forensic DNA identification projects are interdisciplinary — historians locate burial sites, archaeologists excavate, forensic anthropologists analyze remains, and biologists extract DNA. The tools you are building in this module fit directly into the biology side of that pipeline. The key takeaway: any sequence analysis dashboard can be adapted for specialized domains by adjusting thresholds, adding domain-specific features, and customizing the output.
Try it yourself
- Open your CLI tool in an empty folder.
- Paste the main prompt from above.
- Open the generated
sequence-dashboard.htmlin your browser. - Paste a real sequence from your research (or grab one from NCBI GenBank).
- Pick one customization from the list above and add it.
If the tool does something useful for your specific research, save it. Put it in a GitHub repo. Share the link with your lab. You just built a bioinformatics tool in 15 minutes, and you can keep extending it indefinitely.
Key takeaways
- One prompt, one tool: a detailed, specific prompt produces a working sequence analysis dashboard in under 2 minutes.
- Single-file HTML tools bypass all installation barriers — they run on any computer with a browser, which makes them ideal for shared lab workstations and core facilities.
- The FASTA format has edge cases (multi-line wrapping, mixed case, IUPAC codes, blank lines) — building those into your prompt prevents debugging later.
- Iterative customization is the pattern: get a working base, then add features one prompt at a time. Never try to specify everything in the first prompt.
- GC content and ORF analysis are sanity checks you should run on every sequence before cloning, submission, or downstream analysis.
Portfolio suggestion
Save your finished sequence-dashboard.html along with the prompts you used to build and customize it. If you added restriction enzyme mapping or codon usage, those make excellent additions to a lab meeting presentation. Consider creating a short document (3-4 paragraphs) describing what you built, what problem it solves for your lab, and one thing you would add next. This demonstrates both your technical capability and your scientific judgment about what tools are needed.
🔍Advanced: Batch processing multiple FASTA files
If you routinely analyze dozens of sequences (e.g., from a cloning project or a mutagenesis screen), you can extend the dashboard to accept file uploads instead of paste:
Add a file upload button that accepts .fasta and .fa files. When a file is uploaded,read its contents and populate the textarea automatically. Also add a drag-and-dropzone so I can drag FASTA files directly from my file manager. Support uploadingmultiple files at once -- concatenate them with a separator comment line.For truly large-scale analysis (hundreds of sequences), you are better off with a Python CLI tool like the one in Lesson 3. But for 5-50 sequences, the browser dashboard is perfectly adequate and much more convenient than setting up a Python environment.
You paste a FASTA file into the dashboard and GC content shows 0.00% even though the sequence length is correct. What is the most likely cause?
What’s next
In the next lesson, you will build something more ambitious: a CRISPR guide RNA design tool that finds PAM sites, scores guides, and displays results in a sortable, interactive table. Same pattern — showcase, prompt, output, customize — but with React and Vite instead of a single HTML file.