Forensic STR Profile Matcher
What you'll learn
~25 min- Build a forensic STR profile comparison tool with a single AI prompt
- Visualize allele calls at standard CODIS loci with color-coded match status
- Interpret partial profiles from degraded DNA and understand why loci drop out
- Calculate match statistics and understand why they are insufficient for identification alone
What you’re building
A forensic DNA analyst receives STR typing results from skeletal remains recovered at a Korean War battlefield. The electropherogram shows peaks at some loci but not others — the DNA is too degraded for a complete profile. Three of the fifteen CODIS loci produced no result at all. The analyst needs to compare this partial profile against reference samples donated by families of missing service members and quickly assess which references are consistent with the evidence and which can be excluded.
Today that comparison happens in spreadsheets or specialized software that costs thousands of dollars per license. A browser-based visualization tool that displays profiles side-by-side, color-codes matches, and flags exclusions lets the analyst triage cases faster — identifying which reference comparisons warrant full statistical analysis.
That is what you will build in the next 20 minutes.
This tool demonstrates STR profile comparison concepts for training purposes. Real forensic identification requires validated software (e.g., GeneMarker, GeneMapper), statistical likelihood ratios, and accredited laboratory procedures. Never use training tools for actual casework.
By the end of this lesson you will have a forensic STR profile matcher that runs entirely in the browser. It displays allele calls at standard CODIS loci, visualizes profiles as grouped bar charts, color-codes match status, handles missing loci from degraded samples, and calculates triage-level match statistics. You will build it by giving a single, carefully-crafted prompt to an LLM CLI tool.
Load two datasets, align them on a shared key (locus name), compare values, score and visualize. This pattern works for any field comparison: test results vs. reference ranges, actual vs. budget, observed vs. expected.
🔍Domain Primer: Key forensic DNA terms
New to forensic DNA analysis? Here are the key terms you will encounter:
- STR (Short Tandem Repeat) — A region of DNA where a short sequence (2-6 base pairs) repeats in tandem. The number of repeats varies between individuals, making STRs useful for identification. Think of it like a genetic barcode where each “bar” has a different width.
- Allele — A specific variant at a genetic locus. For STRs, the allele is the number of repeats (e.g., allele “12” means 12 repeats). Each person has two alleles per locus (one from each parent).
- Locus (plural: loci) — A specific location on a chromosome where STR typing is performed. The FBI’s CODIS system uses 20 core loci, though older profiles may have only 13.
- CODIS (Combined DNA Index System) — The FBI’s national DNA database system. The “CODIS loci” are the standardized STR markers that all U.S. forensic labs type, enabling cross-laboratory comparison. The current expanded CODIS core includes 20 autosomal loci. This tool uses 15 of them for simplicity. The five omitted loci (D1S1656, D2S441, D10S1248, D12S391, D22S1045) can be added as a customization.
- Electropherogram — The graphical output of capillary electrophoresis showing DNA fragment peaks. Each peak represents an allele, and its position indicates the fragment size (which corresponds to repeat count).
- Amelogenin (AMEL) — A sex-determining marker. Males show X,Y peaks; females show X,X. It is always included in STR typing kits.
- Degraded DNA — DNA that has been damaged by time, heat, moisture, or microbial activity. Degraded samples produce partial profiles because larger STR loci (longer DNA fragments) fail to amplify.
- Partial profile — An STR profile where some loci did not produce results. Common with old skeletal remains. The fewer loci that amplify, the less statistical power for identification.
- Reference sample — DNA collected from a known individual (usually a family member of a missing person) for comparison against evidence profiles.
- Exclusion/inclusion — In direct parent-child comparisons, if even one locus shows alleles that are impossible given the reference, the reference is generally excluded (barring rare mutations at ~0.1-0.3% per locus per generation). For more distant relationships (siblings, uncle-nephew, grandparent-grandchild), single-locus exclusions are expected due to independent assortment and do not rule out relatedness. If all typed loci are consistent, the reference is included (but inclusion is not identification without statistical analysis).
You do not need to be an expert in forensic genetics — the AI tool will handle the implementation. You just need to know what the tool is comparing and what the results mean.
Who this is for
- Forensic DNA analysts who want a quick visual triage tool for partial profile comparisons.
- Forensic anthropology students learning how STR profiles are used in identification.
- Lab coordinators who want to train new analysts on the comparison workflow.
The showcase
Here is what the finished matcher looks like once you open the HTML file in a browser:
- Profile input panel with pre-loaded sample data for an evidence profile and two reference profiles.
- Locus-by-locus comparison table showing allele calls at each CODIS locus, with color-coded status: green (full match), yellow (partial match — one allele shared), red (exclusion — no shared alleles), gray (no data — locus did not amplify).
- Grouped bar chart (Chart.js) showing allele sizes at each locus for evidence and reference profiles side-by-side.
- Electropherogram-style peak view for a selected locus, showing stylized peaks at the allele positions.
- Match statistics panel — loci compared, full matches, partial matches, exclusions, and overall consistency assessment.
- Profile selector to switch between reference profiles for comparison.
Everything runs client-side. No DNA data leaves the browser.
The prompt
Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu.
Full lesson →
, navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project".
Full lesson →
, start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands.
Full lesson →
(e.g., by typing claude), and paste this prompt:
Build a single self-contained HTML file called str-matcher.html that serves asa forensic STR profile comparison and visualization tool. Requirements:
1. PRELOADED SAMPLE DATA (embed as JS objects on page load) Use the 15 CODIS core loci plus Amelogenin. Each profile has allele pairs. "NR" means no result (locus failed to amplify).
Evidence Profile (Case DPAA-2024-0147, left femur): D3S1358: [15, 16], vWA: [17, 18], D16S539: [11, 12], CSF1PO: [10, 12], TPOX: [8, 11], D8S1179: [13, 14], D21S11: [29, 30], D18S51: NR, D5S818: [11, 12], FGA: NR, D13S317: [11, 11], D7S820: [10, 11], TH01: [7, 9.3], D19S433: [13, 14], D2S1338: NR, AMEL: ["X", "Y"]
Reference Profile A (Family Reference - biological son): D3S1358: [15, 17], vWA: [17, 19], D16S539: [11, 13], CSF1PO: [10, 11], TPOX: [8, 8], D8S1179: [13, 15], D21S11: [29, 31.2], D18S51: [14, 18], D5S818: [11, 13], FGA: [21, 24], D13S317: [11, 12], D7S820: [10, 12], TH01: [7, 8], D19S433: [13, 15.2], D2S1338: [19, 23], AMEL: ["X", "Y"]
Reference Profile B (Family Reference - unrelated candidate): D3S1358: [14, 18], vWA: [15, 16], D16S539: [9, 13], CSF1PO: [11, 13], TPOX: [9, 10], D8S1179: [10, 12], D21S11: [28, 32.2], D18S51: [12, 15], D5S818: [9, 13], FGA: [20, 22], D13S317: [8, 12], D7S820: [8, 12], TH01: [6, 8], D19S433: [12, 16], D2S1338: [17, 25], AMEL: ["X", "Y"]
2. COMPARISON TABLE - Table with columns: Locus, Evidence Allele 1, Evidence Allele 2, Reference Allele 1, Reference Allele 2, Match Status - Match status logic: * If evidence locus is NR: gray cell, "No Data" label * If both alleles match (order-independent): green cell, "Full Match" * If exactly one allele is shared: yellow cell, "Partial (1 shared)" Note: sharing one allele at a locus is common between unrelated individuals in the same population and is not evidence of relatedness on its own. The statistical weight of each shared allele depends on its frequency in the relevant population. * If no alleles shared: red cell, "Exclusion" - Show match summary below: X of Y loci compared, N full matches, N partial matches, N exclusions - If ANY locus shows exclusion in a parent-child comparison context, display a prominent note: "Exclusion detected — profiles are inconsistent" - If all compared loci show full or partial match: "No exclusions — profiles are consistent (statistical analysis required)"
3. ALLELE SIZE VISUALIZATION - Chart.js grouped bar chart with loci on the x-axis - For each locus, show 4 bars side by side: evidence allele 1, evidence allele 2, reference allele 1, reference allele 2 - Evidence bars in blue (#3b82f6), reference bars in amber (#f59e0b) - NR loci show no bars (gap in the chart) - Y-axis label: "Allele (repeat count)" - Locus names on x-axis, rotated 45 degrees for readability
4. ELECTROPHEROGRAM-STYLE PEAK VIEW - Below the bar chart, show a stylized electropherogram for a selected locus - Draw Gaussian-shaped peaks at the allele positions using canvas - Evidence peaks in blue, reference peaks in amber (semi-transparent overlay) - X-axis: fragment size range appropriate for the locus - Y-axis: relative fluorescence units (arbitrary height) - Click any locus name in the table to update the peak view for that locus - Show locus name and allele values as labels above each peak
5. PROFILE INPUT / EDITING - Dropdown to select which reference profile to compare (A or B) - "Edit Evidence Profile" button that reveals a form with a row per locus, two allele input fields each, and a "No Result" checkbox - "Add Reference Profile" button that adds a blank profile form - Input validation: alleles must be numeric (or X/Y for AMEL), range 3-50
6. DESIGN - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #10b981 - Clean sans-serif font (Inter from Google Fonts CDN) - Responsive single-column layout - Match status colors: green #22c55e, yellow #eab308, red #ef4444, gray #64748b - Status badges with rounded corners and the status text inside
7. TECHNICAL - Pure HTML/CSS/JS in one file, no build step - Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js) - Peak visualization drawn on an HTML5 canvas element - Allele comparison is order-independent (e.g., [15,16] matches [16,15]) - Handle the special case of homozygous loci (e.g., [8,8]) correctlyThat entire block is the prompt. Paste it as-is. The specificity is deliberate — the more precise you are about requirements, the closer the first output will be to what you actually want. Vague prompts produce vague tools.
What you get
After the LLM finishes (typically 60-90 seconds), you will have a single file: str-matcher.html. Open it in any browser.
Expected output structure
str-matcher.html (~600-900 lines)You should see:
- The comparison table loads immediately with the evidence profile vs. Reference A.
- Most loci show green (full match) or yellow (partial match) — Reference A is a biological son, so every compared locus should share at least one allele (the obligate paternal allele).
- Three loci (D18S51, FGA, D2S1338) show gray “No Data” because the evidence profile had no result at those loci.
- Switch the dropdown to Reference B and the table turns mostly red — Reference B is an unrelated individual with different alleles.
- The grouped bar chart shows evidence (blue) and reference (amber) bars side-by-side. Where bars are the same height, the alleles match.
- Click any locus name to see the electropherogram-style peak view with overlapping blue and amber peaks.
If something is off
| Problem | Follow-up prompt |
|---|---|
| Allele comparison says “Exclusion” when alleles do match | The allele comparison is treating [15,16] as different from [16,15]. Can you make the match logic order-independent? Sort both allele pairs before comparing. |
| NR loci show as zero instead of being skipped | Loci marked as NR are showing allele values of 0 in the chart. Can you skip NR loci entirely — no bars in the chart, gray cell in the table, and exclude them from match statistics? |
| Peak view canvas is blank | The electropherogram canvas is empty. Make sure the canvas element has explicit width and height attributes, and that the peak drawing function is called after the canvas is added to the DOM. |
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works (the 2-minute explanation)
You do not need to understand every line of the generated code, but here is the mental model:
- STR profiles are represented as JavaScript objects with locus names as keys and two-element arrays as values. The special value
nullor"NR"marks loci that did not amplify. - Allele comparison sorts both pairs and checks for overlap. Two alleles matching = full match. One allele matching = partial match (in a parent-child comparison, every locus should share at least one allele — the obligate allele inherited from the parent). Zero alleles matching = exclusion. This is a simplified version of what forensic software does.
- Chart.js grouped bars place four bars at each locus position — two for the evidence profile, two for the reference. When bars are the same height, the allele repeat counts match. This gives an instant visual assessment before you read the table.
- Canvas-based peaks simulate an electropherogram. Each allele is drawn as a Gaussian curve centered at the allele value. Overlapping blue and amber peaks show where profiles agree. This is not a real electropherogram (those come from capillary electrophoresis instruments), but it builds intuition for how raw data looks.
DNA extracted from skeletal remains that have been buried for 70-80 years is severely degraded. The DNA strands break into short fragments, and longer STR loci (which require amplifying longer DNA fragments) are the first to fail. This is why degraded samples typically lose loci like FGA, D18S51, and D2S1338 first — they require longer amplicons. The “No Result” loci in our sample data follow this realistic degradation pattern. Understanding which loci drop out first helps analysts assess sample quality and plan extraction strategies.
Customize it
The base matcher handles two-profile comparison, but real casework involves more complex scenarios. Each of these is a single follow-up prompt:
Add batch comparison mode
Add a "Batch Compare" tab where I can load multiple reference profiles andcompare them all against the evidence profile at once. Show a summary tablewith one row per reference, columns for: Reference ID, Loci Compared, FullMatches, Partial Matches, Exclusions, and a Consistency column (Yes/No). Sortby number of exclusions ascending so the most consistent references appearfirst. Highlight rows with zero exclusions in green.Add degradation pattern analysis
Add a "Degradation Analysis" panel that shows which loci failed to amplifyand explains the pattern. Order loci by typical amplicon size (shortest tolongest). Highlight the dropout pattern — if small-amplicon loci amplified butlarge-amplicon loci did not, display "Pattern consistent with degradation."If the pattern is random, display "Dropout pattern atypical — possibleinhibition or mixed sample." Include a horizontal bar chart showing eachlocus colored by amplicon size (short=green, medium=yellow, long=red) withNR loci marked.Add profile export report
Add an "Export Report" button that generates a printable comparison report.Include: case number, evidence profile table, reference profile table,comparison results with match status per locus, match statistics summary,a disclaimer stating this is a triage tool and not a substitute forstatistical analysis. Use CSS @media print for clean formatting. Includethe bar chart as a static image (Chart.js toBase64Image).Notice the pattern: you start with a working tool, then add features one prompt at a time. Each prompt builds on what already exists. This is how all the tools in this track are built — iteratively, starting from a solid foundation. You never need to plan the entire tool upfront.
Try it yourself
- Open your CLI tool in an empty folder.
- Paste the main prompt from above.
- Open the generated
str-matcher.htmlin your browser. - Review the default comparison (Evidence vs. Reference A — should show consistency).
- Switch to Reference B — observe the exclusions.
- Click different locus names to see the electropherogram-style peaks.
- Try editing the evidence profile — mark an additional locus as “No Result” and see how it affects the match statistics.
- Pick one customization from the list above and add it.
Key takeaways
- One prompt, one tool: a detailed, specific prompt produces a working STR profile comparison tool in under 2 minutes.
- Degraded DNA produces partial profiles — some loci fail to amplify, and the tool must handle missing data gracefully (gray cells, excluded from statistics) rather than treating it as zero.
- In direct parent-child comparisons, a single exclusion at any locus is generally definitive (barring rare mutations at ~0.1-0.3% per locus per generation) — if the evidence and reference have no shared alleles at even one locus, the reference is excluded. For more distant relationships (siblings, uncle-nephew, grandparent-grandchild), single-locus exclusions are expected due to independent assortment and do not rule out relatedness.
- Match percentage is not identification — real forensic identification requires statistical likelihood ratios calculated from population allele frequencies. The triage statistics in this tool help prioritize which comparisons deserve full analysis.
- Allele comparison must be order-independent — a person with alleles [15, 16] is the same as [16, 15]. Building this into the prompt prevents a common bug.
Why do degraded skeletal remains produce partial STR profiles with some loci missing?
An evidence profile and a reference profile share at least one allele at every compared locus (no exclusions). What can you conclude?
What’s next
In the next lesson, you will build an eDNA Contamination QC Checker — a tool that compares negative control samples against field samples to flag shared OTUs and assess contamination risk in environmental DNA studies. Same pattern: one prompt, one working tool, then customize.