Species Detection Heatmap

What you’re building

Imagine uploading a species detection CSV and instantly seeing a color-coded heatmap showing exactly which species were found at which sites — sortable, filterable, and click-to-highlight. No R, no ggplot, no Python seaborn. Just one HTML file you can open on any lab computer or project in a conference talk.

That is what you will build in the next 20 minutes.

💬This is a publication-ready figure

Presence/absence heatmaps are one of the most common figures in eDNA survey papers. Building your own means you control the formatting, color scheme, and sorting — instead of fighting with R plot margins or asking a bioinformatician to re-run a script every time you add a site. This is the figure you include in your thesis, your monitoring report, or your agency presentation.

By the end of this lesson you will have a standalone species detection heatmap that runs entirely in the browser. It accepts a CSV of species detections by site, renders an interactive heatmap with CSS-styled cells, provides summary bar charts for species richness and detection frequency (Chart.js), and lets you click any species to highlight its distribution across all sites.

ℹSoftware pattern: Matrix → heatmap → filter → export

This pattern applies to any matrix dataset you want to visualize: gene expression across samples, survey responses across demographics, sensor readings across locations. The heatmap is a universal data visualization tool.

🔍Domain Primer: Key eDNA survey terms you'll see in this lesson

Here are the terms you will encounter in this lesson:

Presence/absence matrix — A table where rows are species, columns are sampling sites, and cells are either 1 (detected) or 0 (not detected). The simplest way to summarize a biodiversity survey.
Species richness — The number of distinct species detected at a site. A site with 10 species has higher richness than one with 3. It is the most basic measure of biodiversity.
Detection frequency — How many sites a species was detected at, expressed as a count or a proportion. A species detected at 7 out of 8 sites has a detection frequency of 87.5%.
Occupancy — The proportion of sites where a species is present. In eDNA studies, “naive occupancy” is based on raw detections; “model-based occupancy” accounts for imperfect detection probability.
Sampling site — A specific geographic location where an eDNA sample was collected. Each site usually has GPS coordinates and a name or code.
Heatmap — A matrix visualization where cell color encodes a value. For presence/absence, it is binary (detected vs. not). For read counts, color intensity can scale with abundance.
Biodiversity survey — A systematic effort to catalog which species are present in an area. eDNA surveys are increasingly used alongside traditional methods (electrofishing, trapping, visual surveys).
Rare species — A species detected at very few sites. Rare detections are the most interesting findings in biodiversity surveys but also the most likely to be false positives, which is why the contamination QC from the previous lesson matters.
Ubiquitous species — A species detected at all or nearly all sites. These are the common, widespread organisms in the ecosystem.

You do not need to memorize these — the tool makes them visible through the visualization itself.

Who this is for

eDNA survey researchers who need a fast, shareable visualization of detection results across sites.
Natural resource managers who want to see species distributions across monitoring sites without learning R.
Graduate students presenting eDNA survey results at lab meetings or conferences who want an interactive figure they can click through.

ℹCore Facility Context

eDNA core labs and environmental monitoring programs often deliver results as CSV tables. PIs and agency partners need visualizations, not spreadsheets. A self-built heatmap tool that runs in any browser lets you deliver results as an interactive HTML file alongside the raw data — no software installation required on the recipient’s end.

The showcase

Here is what the finished tool looks like once you open the HTML file in a browser:

Header with a file upload area for your detection matrix CSV (or a textarea for pasting).
Heatmap grid — rows are species, columns are sites, cells are colored (green = detected, gray = not detected, with opacity scaling if read counts are provided instead of binary values).
Sorting controls — sort rows by species name (alphabetical), total detections (most to fewest), or detection frequency. Sort columns by site name or species richness.
Species richness bar chart (Chart.js) — one bar per site showing how many species were detected there.
Detection frequency bar chart (Chart.js) — one bar per species showing at how many sites it was found.
Click-to-highlight — click any species name to highlight all sites where it was found. Click a site header to highlight all species found there.
Summary stats — total species, total sites, average richness, rarest species, most common species.

Everything runs client-side. Your detection data never leaves your browser.

The prompt

Open your terminal , navigate to a project folder , start your AI CLI tool (e.g., by typing claude), and paste this prompt:

Build a single self-contained HTML file called species-heatmap.html that serves
as an interactive species detection heatmap for eDNA survey data. Requirements:

1. DATA INPUT
   - File upload button accepting .csv files plus a textarea for pasting CSV data
   - CSV format: first column is species name, remaining columns are site names,
     cell values are either 0/1 (presence/absence) or integer read counts
   - If read counts are provided (values > 1), treat any value > 0 as "detected"
     for the heatmap but use the count for opacity scaling
   - Include a "Load Example" button with this embedded dataset:

     Species,Devils_Lake,Mirror_Lake,Lake_Mendota,Trout_Creek,Spring_Creek,Yahara_River,Pheasant_Branch,Lake_Wingra
     Salvelinus_fontinalis,1,0,0,1,1,0,0,0
     Micropterus_salmoides,1,1,1,0,1,1,1,1
     Oncorhynchus_mykiss,0,0,0,1,1,0,0,0
     Lithobates_catesbeianus,1,1,1,1,1,1,1,1
     Chelydra_serpentina,0,0,1,0,0,1,0,1
     Esox_lucius,1,1,0,0,0,1,0,0
     Salmo_trutta,0,0,0,1,1,0,0,0
     Ambloplites_rupestris,1,0,1,0,0,1,1,0
     Cyprinus_carpio,0,0,1,0,0,1,0,1
     Notemigonus_crysoleucas,1,0,0,0,1,0,0,0
     Ictalurus_punctatus,0,0,1,1,0,0,0,1
     Notropis_hudsonius,1,0,1,0,0,1,0,0
     Catostomus_commersonii,0,1,0,0,1,0,1,0
     Perca_flavescens,1,0,1,1,0,0,0,1
     Lepomis_macrochirus,1,1,1,0,1,1,1,1

2. HEATMAP GRID
   - Render as an HTML table with CSS-styled cells
   - Detected cells: background #10b981 (green), not detected: #374151 (dark gray)
   - When input has read counts > 1, scale the green opacity from 0.3 (low reads)
     to 1.0 (highest reads) so abundance differences are visible
   - Species names in the first column, site names as column headers rotated 45
     degrees for readability
   - Hover tooltip on each cell showing: species name, site name, and value
     (detected/not detected, or read count if provided)
   - Cell size uniform, roughly 36x36 pixels

3. SORTING AND FILTERING
   - Row sort buttons: Alphabetical (A-Z), By Total Detections (descending),
     By Detection Frequency (descending, same as total for binary data)
   - Column sort buttons: Alphabetical (A-Z), By Species Richness (descending)
   - Text filter: type a species name to filter rows (partial match, case-insensitive)
   - Detection filter slider: show only species detected at >= N sites

4. SUMMARY CHARTS (Chart.js from CDN)
   - Vertical bar chart: Species Richness per Site (one bar per site column,
     y-axis = number of species detected, bar color #10b981)
   - Horizontal bar chart: Detection Frequency per Species (one bar per species
     row, x-axis = number of sites, bar color #38bdf8)
   - Both charts update when filtering or sorting changes the visible data

5. CLICK-TO-HIGHLIGHT
   - Click a species name (row header) to highlight that entire row plus all
     column headers where it was detected. Second click deselects.
   - Click a site name (column header) to highlight that entire column plus all
     row headers for species detected there. Second click deselects.
   - Highlight color: border #facc15 (yellow), 2px solid

6. SUMMARY STATS (top panel, updates with filters)
   - Total species shown / total in dataset
   - Total sites
   - Average species richness (mean across sites)
   - Rarest species: name and detection count (fewest sites, >0)
   - Most common species: name and detection count
   - Most species-rich site: name and richness count

7. DESIGN
   - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #38bdf8
   - Clean sans-serif font (Inter from Google Fonts CDN)
   - Responsive layout: heatmap scrolls horizontally on narrow screens
   - Include a Clear button to reset everything
   - Add an "Export PNG" button that uses html2canvas (CDN) to capture the
     heatmap as a downloadable image

8. TECHNICAL
   - Pure HTML/CSS/JS in one file, no build step
   - Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js)
   - html2canvas loaded from CDN (https://cdn.jsdelivr.net/npm/html2canvas)
   - CSV parsing handles quoted fields
   - All processing client-side, no data uploaded anywhere

💡Copy-paste ready

That entire block is the prompt. Paste it as-is. The sample dataset uses real Wisconsin water body names and realistic freshwater species. Lithobates catesbeianus (American bullfrog) and Micropterus salmoides (largemouth bass) are detected at nearly all sites (ubiquitous), while Oncorhynchus mykiss (rainbow trout) and Salmo trutta (brown trout) only appear at cold-water sites (rare/specialized).

What you get

After the LLM finishes (typically 60-90 seconds), you will have a single file: species-heatmap.html. Open it in any browser.

Expected output structure

species-heatmap.html    (~500-700 lines)

Click Load Example and you should see:

A 15 × 8 heatmap grid with green cells (detected) and gray cells (not detected).
Two ubiquitous species with green cells across most or all columns: Lithobates catesbeianus (8/8 sites) and Micropterus salmoides (7/8 sites).
Two rare species with green cells in only 2 columns: Oncorhynchus mykiss (Trout_Creek and Spring_Creek) and Salmo trutta (same two sites — both are cold-water species restricted to trout streams).
The species richness chart showing Devils_Lake and Lake_Mendota with the highest richness (8-9 species each).
The detection frequency chart showing Lithobates catesbeianus at the top with 8 sites and Oncorhynchus mykiss / Salmo trutta at the bottom with 2 sites each.
Click highlighting — clicking Esox lucius should highlight Devils_Lake, Mirror_Lake, and Yahara_River.

If something is off

Problem	Follow-up prompt
Heatmap cells are all the same color despite different read counts	`The opacity scaling for read counts is not working. Make sure you are dividing each cell's count by the maximum count in the dataset to get a 0-1 opacity value, then using rgba(16, 185, 129, opacity) for the cell background.`
Site headers overlap or are unreadable	`The rotated column headers are overlapping. Increase the header height to 120px and add white-space: nowrap to the header cells. Also add a bottom margin to the heatmap container so the rotated text does not get clipped.`
Charts do not update when filters change	`The charts are only rendered once on initial load. Can you add a function that re-renders both charts whenever the heatmap data changes (filtering, sorting, or detection threshold slider)?`

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

Heatmap shows a single row or column instead of the full grid

Evidence

The CSV has 15 species and 8 sites but only one row appears. The textarea shows the data correctly with commas and line breaks.

What to ask the AI

"The CSV parser is treating the entire pasted text as one line. It looks like the line break splitting is not working. Can you split on both \n and \r\n to handle different operating system line endings? Also trim any trailing whitespace from each line before splitting on commas."

Symptom

Clicking a species highlights the wrong columns

Evidence

I click Esox_lucius and it highlights columns 1, 2, 3 instead of columns 1, 2, 6 (Devils_Lake, Mirror_Lake, Yahara_River). The highlight indices are sequential instead of matching actual detection positions.

What to ask the AI

"The click-to-highlight function is using the index of the detection in a filtered list instead of the actual column index from the original data. Can you fix it to use the original column positions from the CSV when applying highlights?"

Symptom

Export PNG only captures part of the heatmap

Evidence

The downloaded PNG shows the heatmap header and first 5 rows but cuts off the rest. The full heatmap is scrollable in the browser.

What to ask the AI

"The html2canvas capture is only getting the visible viewport, not the scrollable content. Can you set the html2canvas options to capture the full scrollWidth and scrollHeight of the heatmap container? Set windowWidth and windowHeight to match the container's scroll dimensions."

Symptom

Species names with underscores look wrong in the heatmap

Evidence

The heatmap shows 'Salvelinus_fontinalis' with the underscore. I want it to display as 'Salvelinus fontinalis' in italics.

What to ask the AI

"Can you replace underscores with spaces in the species display names and render them in italics? Keep the original underscore names in the CSV export. Use CSS font-style: italic on the species name cells."

How it works (the 2-minute explanation)

You do not need to understand every line of the generated code, but here is the mental model:

CSV parsing reads the first row as site headers and the first column of each subsequent row as a species name. Every other cell becomes a detection value (0 or a read count).
The heatmap is an HTML <table> with CSS styling. Each cell gets a background color based on its value. For binary data, it is green or gray. For read counts, the green opacity scales linearly with the count.
Sorting rearranges the table rows or columns by rebuilding the <table> from the sorted data array. JavaScript sorts the underlying data, then re-renders.
Click-to-highlight attaches an event listener to each row header and column header. When clicked, it adds a CSS class to the relevant cells. A second click removes it.
Chart.js renders the two bar charts. When filters change, the chart data is updated and the charts re-render.

🔍For Researchers: Interpreting the heatmap

The heatmap is not just a pretty picture — it reveals ecological patterns. Species that cluster in the same columns (sites) may share habitat preferences. Sites that cluster with similar species compositions are ecologically similar. The two trout species (Oncorhynchus mykiss and Salmo trutta) appearing only at Trout_Creek and Spring_Creek is not a coincidence — those are cold-water species restricted to streams with suitable temperatures. The heatmap makes these patterns visible at a glance in a way that a CSV table never can.

ℹDetections are not the same as confirmed presence

The heatmap shows detections, not confirmed presence. A species absent from a cell may still be present at the site — eDNA detection is probabilistic. Low eDNA concentration, DNA degradation, primer mismatch, or PCR stochasticity can all cause false negatives. Occupancy models account for imperfect detection, but require replicate sampling data. Treat empty cells as “not detected,” not “not present.”

Customize it

The base heatmap is useful as-is, but here are extensions that make it more powerful for publication and analysis:

Add read-count gradient mode

Add a toggle between "Presence/Absence" mode (binary green/gray) and
"Abundance" mode (green gradient scaled by read count). In abundance mode,
show the read count number inside each cell in small white text. Add a
color scale legend showing the gradient range from the minimum to maximum
read count.

🔍Read counts are semi-quantitative

Read counts in metabarcoding are semi-quantitative — they do not reliably reflect organism abundance due to primer binding efficiency, PCR amplification bias, and sequencing depth differences between samples. Use read-count gradients to identify strong vs. weak detections within a sample, not to compare abundance across species or sites. A species with 500 reads and another with 5,000 reads in the same sample are not necessarily present at a 1:10 ratio in the environment.

Add site metadata row

Add a row at the top of the heatmap for site metadata. Let me upload a second
CSV with columns: Site, Latitude, Longitude, Habitat_Type, Date_Sampled.
Display Habitat_Type as a colored bar above each site column (e.g., lake=blue,
river=cyan, pond=teal, stream=green). Show the other metadata in the hover
tooltip.

Add similarity clustering

Add a "Cluster" button that reorders both rows and columns by similarity
using Jaccard distance. For rows: species with similar site distributions
should be adjacent. For columns: sites with similar species compositions
should be adjacent. Use a simple hierarchical clustering approach
(nearest-neighbor linkage). Add dendrograms on the left side (species)
and top (sites) showing the clustering tree.

ℹThe customization loop

Start with the working heatmap. Add one feature at a time. The clustering extension turns a simple presence/absence table into a proper ecological analysis figure. But the base version is already useful for 90% of presentations and reports.

Try it yourself

Open your CLI tool in an empty folder.
Paste the main prompt from above.
Open the generated species-heatmap.html in your browser.
Click Load Example and verify the heatmap renders correctly.
Try sorting by detection frequency — the trout species should drop to the bottom.
Click Lithobates catesbeianus to see it highlighted across all 8 sites.
If you have real eDNA detection data (or output from the contamination checker in the previous lesson), paste it in and see your own heatmap.

Key takeaways

Presence/absence heatmaps are the standard visualization for eDNA survey data — and building your own gives you full control over formatting and interactivity.
Sorting reveals ecological patterns: sorting by detection frequency separates ubiquitous species from rare ones; sorting sites by richness highlights biodiversity hotspots.
Click-to-highlight makes the heatmap interactive in a way that static R or Python plots cannot match — ideal for presentations and exploratory analysis.
The heatmap connects directly to the contamination QC checker from the previous lesson: clean your data first, then visualize the results.
Export to PNG makes the heatmap immediately usable in papers, reports, and slide decks without screenshots.

KNOWLEDGE CHECK

Your heatmap shows that Site A has 12 species detected and Site B has 3 species detected. What does this difference in species richness suggest?

KNOWLEDGE CHECK

You notice that Oncorhynchus mykiss (rainbow trout) and Salmo trutta (brown trout) are detected at exactly the same two sites and nowhere else. What is the most likely explanation?

What’s next

In the next lesson, you will build a Single-Cell Expression Explorer — a UMAP/t-SNE viewer for pre-computed coordinate CSVs with cluster coloring. Same single-file HTML pattern, but applied to single-cell genomics data. If eDNA tells you which species are where, single-cell tells you which genes are active in which cells — a different scale of biodiversity.

What you'll learn

What you’re building

Who this is for

The showcase

The prompt

What you get

Expected output structure

If something is off

When Things Go Wrong

How it works (the 2-minute explanation)

Customize it

Add read-count gradient mode

Add site metadata row

Add similarity clustering

Try it yourself

Key takeaways

What’s next