Applied Module 12 Β· AI-Powered Bioinformatics Tools

Single-Cell Expression Explorer

What you'll learn

~25 min
  • Build a browser-based single-cell expression explorer from pre-computed UMAP coordinates
  • Visualize cell clusters with color-coded scatter plots and overlay gene expression gradients
  • Troubleshoot Canvas rendering and CSV parsing issues common with large coordinate files
  • Extend the explorer with custom cluster annotations, differential expression views, and export features

What you’re building

You just ran a 10x Genomics Chromium single-cell RNA-seq experiment. Cell Ranger counted the transcripts. Seurat or Scanpy reduced the dimensions, clustered the cells, and spit out a CSV of UMAP coordinates with cluster assignments. Now you need to actually look at the data β€” but the PI standing behind you does not have R installed, the postdoc across the hall uses Python, and the rotation student has neither.

You are going to build a tool that lets anyone open a single HTML file, load that CSV, and explore the results interactively. No R. No Python. No Jupyter. Just a browser.

πŸ’¬This is your facility's show-and-tell tool

The Gene Expression Center generates these coordinate files for every single-cell experiment. Right now, viewing them requires computational expertise. This tool turns every PI meeting, every lab meeting, every grant figure session into a drag-and-drop experience. That is the kind of utility that makes a core facility indispensable.

By the end of this lesson you will have a single-cell expression explorer that renders thousands of cells as colored dots on a canvas, lets you toggle gene expression overlays, and shows cluster summary statistics. You will build it with a single prompt.

β„ΉSoftware pattern: High-performance scatter plot viewer

Upload CSV coordinates, render on HTML5 Canvas, color by category or continuous value. This pattern works for any dataset with X/Y coordinates and grouping variables β€” survey response embeddings, customer segmentation, geographic clustering. The biology is specific; the visualization technique is universal.

πŸ”Domain Primer: Key single-cell terms you'll encounter

New to single-cell analysis? Here are the terms that appear in this lesson:

  • Single-cell RNA-seq (scRNA-seq) β€” A technique that measures gene expression in individual cells rather than averaging across millions. Each cell gets its own barcode, producing a gene-by-cell expression matrix. Think of it as giving every cell in your sample its own microphone.
  • UMAP (Uniform Manifold Approximation and Projection) β€” A dimensionality reduction algorithm that compresses thousands of gene expression measurements into two coordinates (UMAP_1 and UMAP_2) while preserving the relationships between cells. Cells that are transcriptionally similar land near each other. It is the current standard for visualizing single-cell data.
  • t-SNE (t-distributed Stochastic Neighbor Embedding) β€” An older dimensionality reduction method with the same goal as UMAP. Still widely used but slower and less consistent across runs. Unlike UMAP, t-SNE does not preserve global structure β€” distances between clusters on a t-SNE plot are not meaningful.
  • Cell cluster β€” A group of cells with similar gene expression profiles, identified by algorithms like Louvain or Leiden community detection. Clusters typically correspond to cell types (T cells, B cells, monocytes) or cell states (activated, resting).
  • Marker gene β€” A gene whose expression is distinctly high in one cluster compared to others. CD3E marks T cells, CD14 marks monocytes, MS4A1 marks B cells. Marker genes are how you assign biological identity to computational clusters.
  • 10x Genomics Chromium β€” The dominant commercial platform for single-cell RNA-seq. It partitions cells into droplets with barcoded beads, enabling high-throughput profiling of thousands of cells per run.
  • Cell Ranger β€” 10x Genomics’ analysis pipeline that processes raw sequencing data into gene expression matrices.
  • Seurat / Scanpy β€” The two most widely used software packages for downstream single-cell analysis (Seurat in R, Scanpy in Python). They handle normalization, clustering, and dimensionality reduction.
  • Dimensionality reduction β€” Compressing high-dimensional data (20,000+ genes per cell) into 2-3 dimensions for visualization. Information is lost, but the overall structure β€” which cells are similar β€” is preserved.
  • Log-normalization β€” A standard transformation applied to raw gene counts to reduce the effect of sequencing depth differences between cells. Values you see in expression columns are typically log-normalized.

You do not need to run any of these tools β€” the CSV you are loading already contains the finished output. You just need to understand what the columns mean.

Who this is for

  • PIs and staff scientists who want to explore single-cell results without opening R or Python.
  • Core facility staff who need a quick way to share results with collaborators across departments.
  • Grad students preparing figures for lab meeting or a paper and wanting to interactively explore cluster assignments before committing to a static plot.
β„ΉGene Expression Center Context

The GEC runs 10x Chromium experiments for researchers across campus. After Cell Ranger processing and Seurat/Scanpy analysis, the deliverable is usually a set of files: a coordinates CSV, a metadata table, and sometimes an RDS or h5ad object. The coordinates CSV is the most universally useful output β€” it contains everything needed for visualization. This tool makes that CSV immediately explorable by anyone, regardless of their computational background.

πŸ’‘Exporting coordinates from Seurat or Scanpy

If your data is in an RDS or h5ad file, export the coordinates first.

Seurat (R):

write.csv(cbind(Embeddings(obj, "umap"), obj@meta.data[, c("seurat_clusters", "marker_genes")]), "coordinates.csv")

Scanpy (Python):

import pandas as pd
df = pd.concat([pd.DataFrame(adata.obsm["X_umap"], columns=["UMAP_1","UMAP_2"], index=adata.obs_names), adata.obs[["leiden"]]], axis=1)
df.to_csv("coordinates.csv")

The resulting CSV will have the columns this explorer expects. Adjust the metadata columns (e.g., seurat_clusters, leiden) to match your analysis.


The showcase

Here is what the finished explorer looks like once you open the HTML file in a browser:

  • Header with a file upload zone for CSV files and a β€œLoad Sample Data” button.
  • Main canvas (800x600) rendering every cell as a colored dot, positioned by UMAP_1 and UMAP_2 coordinates.
  • Cluster legend on the right side listing each cluster with its color, name, and cell count.
  • Gene selector dropdown that lets you pick a gene column (CD3E, CD14, MS4A1, NKG7, FCGR3A) and re-color all points by expression level using a blue-to-red gradient.
  • Sidebar panel with a Chart.js bar chart showing cells per cluster.
  • Hover tooltip showing cell ID, cluster, and expression values when you mouse over a point.
  • Cluster-vs-cluster comparison β€” select two clusters to see a bar chart of mean expression for each marker gene, highlighting which genes differentiate them.

The sample dataset contains 500 cells across 7 clusters representing common PBMC (peripheral blood mononuclear cell) types. Everything runs client-side. No data leaves the browser.


The prompt

Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu. Full lesson → , navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project". Full lesson → , start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI β€” a command-line AI that reads files, writes code, and runs commands. Full lesson → (e.g., by typing claude), and paste this prompt:

Build a single self-contained HTML file called single-cell-explorer.html that visualizes
pre-computed single-cell RNA-seq dimensionality reduction coordinates. Requirements:
1. DATA FORMAT
- Accepts CSV files with columns: cell_id, UMAP_1, UMAP_2, cluster, and optional
gene expression columns (any additional numeric columns are treated as genes)
- Parse CSV on upload using vanilla JS (no Papa Parse)
- Include a "Load Sample Data" button that populates with 500 embedded sample cells:
7 clusters (0-6) with these approximate cell type compositions and marker profiles:
Cluster 0: CD4 T cells (120 cells) β€” CD3E high (~2.5-4.0), others low (<0.5)
Cluster 1: CD14 Monocytes (100 cells) β€” CD14 high (~3.0-5.0), FCGR3A moderate (~1.0-2.0)
Cluster 2: B cells (80 cells) β€” MS4A1 high (~2.5-4.5), others low
Cluster 3: CD8 T cells (70 cells) β€” CD3E high (~2.0-3.5), NKG7 moderate (~1.0-2.0)
Cluster 4: NK cells (50 cells) β€” NKG7 high (~3.0-5.0), FCGR3A moderate (~1.0-2.5)
Cluster 5: FCGR3A Monocytes (45 cells) β€” FCGR3A high (~3.0-4.5), CD14 moderate (~1.0-2.0)
Cluster 6: Dendritic cells (35 cells) β€” all markers low-moderate (~0.5-1.5)
Generate UMAP_1 and UMAP_2 as Gaussian clusters centered at well-separated points
(spread ~1.0-1.5 per cluster) so clusters are visually distinct on the plot.
Gene expression values should be log-normalized (non-negative floats, mostly 0-5 range).
Gene columns: CD3E, CD14, MS4A1, NKG7, FCGR3A
2. MAIN SCATTER PLOT (HTML5 Canvas β€” NOT Chart.js, Canvas handles 5000+ points better)
- Render each cell as a small circle (radius 3px) at its UMAP_1/UMAP_2 position
- Auto-scale axes to fit all data points with 10% padding
- Color by cluster assignment using 7 distinct colors from a colorblind-friendly palette
- On hover, show a tooltip with cell_id, cluster, and all gene expression values
- Axis labels: "UMAP 1" and "UMAP 2"
3. GENE EXPRESSION OVERLAY
- A dropdown listing all gene columns detected in the CSV
- When a gene is selected, re-color all points using a gradient: dark blue (low/zero)
to bright red (high expression), scaled to the min-max of that gene
- Show a color gradient legend bar below the dropdown
- A "Reset to Clusters" button to go back to cluster coloring
4. CLUSTER SIDEBAR
- Right-side panel with a Chart.js bar chart showing cell count per cluster
- Below the bar chart, a summary table: cluster ID, cell count, percentage of total
- Use Chart.js loaded from CDN (https://cdn.jsdelivr.net/npm/chart.js)
5. CLUSTER COMPARISON
- Two dropdowns to select "Cluster A" and "Cluster B"
- A grouped Chart.js bar chart showing mean expression of each gene in Cluster A
vs Cluster B, with error bars (standard deviation)
- Highlight bars where the difference in means exceeds 1.0 (potential marker genes)
6. DESIGN
- Dark theme: background #0f172a, panels #1e293b, text #e2e8f0, accent #38bdf8
- Clean sans-serif font (Inter from Google Fonts CDN)
- Layout: canvas on the left (~65% width), sidebar on the right (~35% width)
- Comparison section below the main canvas, full width
- Drag-and-drop zone for CSV upload with visual feedback
- Responsive: stack vertically on narrow screens
7. TECHNICAL
- Pure HTML/CSS/JS in one file, no build step
- Canvas for the scatter plot (performance with thousands of points)
- Chart.js from CDN for bar charts only
- Generate the 500-cell sample dataset in JavaScript with seeded random values
so the sample data is identical every time
πŸ’‘Copy-paste ready

That entire block is the prompt. Paste it as-is. The prompt specifies Canvas for the main scatter plot (handles thousands of points smoothly) and Chart.js only for the bar charts in the sidebar and comparison panel. This hybrid approach gives you both performance and polished charts.


What you get

After the LLM finishes (typically 60-90 seconds), you will have a single file: single-cell-explorer.html. Open it in any browser.

Expected output structure

single-cell-explorer.html (~800-1200 lines)

Click Load Sample Data and you should see:

  1. A scatter plot with 500 colored dots forming 7 visually distinct clusters.
  2. A cluster legend with colors matching the plot. Cluster 0 (CD4 T cells) should be the largest group.
  3. The sidebar bar chart showing cell counts: Cluster 0 at 120, declining to Cluster 6 at 35.
  4. Selecting β€œCD3E” from the gene dropdown should re-color the plot β€” Clusters 0 and 3 (T cells) turn red/warm while other clusters stay blue/cool.
  5. The cluster comparison chart should show clear differences: comparing Cluster 0 vs Cluster 1 should reveal CD3E high in Cluster 0 and CD14 high in Cluster 1.

If something is off

ProblemFollow-up prompt
Canvas is blank but sidebar worksThe Canvas scatter plot is not rendering. Make sure the canvas element has explicit width and height attributes and that the drawing code runs after the data is loaded. Check that the coordinate scaling maps UMAP values to canvas pixel coordinates correctly.
All points are the same colorThe cluster coloring is not working. Each point should get a color based on its cluster value (0-6). Make sure the color lookup uses the cluster column as an integer index.
Gene overlay gradient is not visibleThe expression overlay is not re-coloring points. Make sure the gradient interpolation maps the min-max expression range to a blue-to-red color scale, and that the canvas redraws after the dropdown changes.
Tooltip does not appear on hoverThe hover tooltip is not showing. Add a mousemove event listener on the canvas that checks the distance from the cursor to each point and shows a positioned div with cell info when within 5px of a point.

πŸ”§

When Things Go Wrong

Use the Symptom β†’ Evidence β†’ Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom
Uploading a large CSV (10,000+ cells) makes the browser freeze
Evidence
After dropping a 15MB CSV, the page becomes unresponsive for 30+ seconds. The scatter plot eventually renders but hover is laggy.
What to ask the AI
"My CSV has 12,000 cells and the browser freezes during rendering. Can you optimize the canvas drawing by: (1) reducing point radius to 2px for datasets over 5,000 cells, (2) using requestAnimationFrame for initial render, and (3) implementing spatial indexing (a simple grid) for the hover detection so it does not check every point on every mouse move?"
Symptom
CSV with t-SNE columns instead of UMAP columns does not display
Evidence
My Scanpy output uses columns named tSNE_1 and tSNE_2 instead of UMAP_1 and UMAP_2. The upload completes but no points appear on the canvas.
What to ask the AI
"My CSV uses tSNE_1 and tSNE_2 column names instead of UMAP_1 and UMAP_2. Can you make the coordinate column detection flexible? Look for columns containing 'UMAP', 'umap', 'tSNE', 'tsne', 'PC_1', or 'X_' prefix and let the user confirm which two columns to use as X and Y axes."
Symptom
Cluster column contains text labels instead of numbers
Evidence
Seurat exported my cluster column as cell type names like 'CD4_T_cell' instead of numbers 0-6. The legend shows nothing and all points are gray.
What to ask the AI
"My cluster column has text labels (like 'CD4_T_cell', 'Monocyte') instead of numeric IDs. Can you handle both formats? If the cluster column contains strings, assign a unique color to each unique value and use the text labels directly in the legend."
Symptom
Gene expression overlay shows all points the same dark blue
Evidence
Selecting CD3E from the dropdown turns all points dark blue. The gradient legend shows a range of 0.00 to 4.32 but only one color appears on the plot.
What to ask the AI
"The expression gradient is not scaling correctly. It looks like all values are being mapped to the low end. Make sure the normalization uses (value - min) / (max - min) for each gene, and that zero-expression cells get the minimum color while the highest-expressing cell gets bright red. Also check that the color interpolation produces visible mid-range colors."

How it works (the 2-minute explanation)

You do not need to understand every line of the generated code, but here is the mental model:

  1. CSV parsing splits on newlines and commas, uses the first row as headers, and converts numeric columns to floats. Any column that is not cell_id or cluster and contains numeric values is treated as a gene expression column.
  2. Canvas rendering maps UMAP_1 and UMAP_2 values to pixel coordinates by finding the data range and scaling to the canvas dimensions with padding. Each cell is drawn as a filled circle using arc().
  3. Cluster coloring assigns each cluster ID a color from a palette. The same palette drives the legend and the sidebar chart.
  4. Gene overlay replaces cluster colors with a gradient. For each cell, the expression value is normalized to 0-1 across the gene’s range, then mapped to an RGB interpolation from blue (low) to red (high). Cells with zero expression stay blue; highest-expressing cells turn red.
  5. Chart.js bar charts handle the sidebar cell count and the cluster comparison. These are standard bar chart configurations β€” the same new Chart(canvas, config) pattern you have seen in earlier lessons.
πŸ”For Researchers: Why Canvas instead of SVG or Chart.js for scatter plots

SVG creates a DOM element for every data point. At 500 cells that is fine; at 10,000 it becomes sluggish; at 50,000 the browser may crash. HTML5 Canvas draws pixels directly β€” 50,000 points render in milliseconds. The tradeoff is that Canvas points are not individually addressable DOM elements, so hover detection requires manual distance calculations. For single-cell data, where datasets routinely contain 5,000 to 100,000 cells, Canvas is the only viable browser-based approach without WebGL.


Customize it

The base explorer handles the most common use case β€” loading coordinates and looking at clusters. Here are three extensions, each a single follow-up prompt:

Add cluster labels directly on the plot

Add text labels on the scatter plot showing the cluster name at the centroid (mean X, mean Y)
of each cluster. Use white text with a dark background pill/badge so the labels are readable
against the colored points. Make the labels toggleable with a checkbox.

Add a lasso selection tool

Add a lasso selection tool: when I hold Shift and drag on the canvas, draw a freeform
selection outline. After releasing, highlight the selected cells, show their count, and
display a bar chart of gene expression means for just the selected subset vs the rest of
the dataset. This lets me explore sub-populations within a cluster.

Export the current view as a publication figure

Add an "Export PNG" button that saves the current canvas view as a high-resolution PNG
(2x scale for retina). Include the axis labels, cluster legend, and a title bar in the
exported image. Also add an "Export SVG" option that recreates the scatter plot as an SVG
element (for vector editing in Illustrator) using the same colors and positions.
β„ΉThe customization loop

Same pattern as every lesson in this track: start with a working tool, then add features one prompt at a time. The lasso selection and export features would take hours to code from scratch. As a follow-up prompt to an existing tool, each takes about 60 seconds.


Try it yourself

  1. Open your CLI tool in an empty folder.
  2. Paste the main prompt from above.
  3. Open the generated single-cell-explorer.html in your browser.
  4. Click β€œLoad Sample Data” and explore the clusters.
  5. Try the gene expression overlay β€” select each marker gene and see which clusters light up.
  6. If you have a real coordinate CSV from a Seurat or Scanpy analysis, drag it onto the upload zone.
  7. Pick one customization from the list above and add it.

Key takeaways

  • Canvas handles thousands of points where Chart.js and SVG would choke β€” this is the right tool for single-cell visualization in the browser.
  • Pre-computed coordinates are the universal export format for single-cell analysis. Whether your lab uses Seurat, Scanpy, or any other tool, the output is a table of coordinates and clusters that this explorer can display.
  • Gene expression overlays turn a static cluster plot into an exploratory tool β€” seeing which clusters express which markers is how you assign biological identity to computational clusters.
  • A single HTML file eliminates the β€œI don’t have R/Python installed” problem that plagues every core facility. Email the file along with the CSV and anyone can explore the data.
  • The hybrid Canvas + Chart.js approach gives you performance for the scatter plot and polish for the bar charts. Know when to use each.

KNOWLEDGE CHECK

Why does this tool use HTML5 Canvas for the scatter plot instead of Chart.js?

KNOWLEDGE CHECK

What does a cluster represent in a single-cell UMAP plot?

KNOWLEDGE CHECK

You select CD3E from the gene dropdown and clusters 0 and 3 turn red while all other clusters stay dark blue. What does this tell you biologically?


What’s next

You have completed all 17 tools in the AI-Powered Bioinformatics Tools track. From sequence analysis dashboards to CRISPR guide design, from colony breeding planners to isotope QC dashboards, from forensic chain-of-custody tracking to eDNA contamination checking, and from RNA-seq pipelines to this single-cell explorer, you now have a portfolio of browser-based and CLI tools that cover the breadth of a modern core facility operation.

Here is what to do from here:

  • Revisit and customize. Go back to the tools most relevant to your specific facility or research. The customization prompts in each lesson are starting points β€” the real value comes when you adapt these tools to your actual data and workflows.
  • Share with your lab. Every single-file HTML tool you built can be emailed to a collaborator or dropped into a shared drive. Start with the one that solves the most immediate pain point for your group.
  • Continue building skills. The shared modules cover deeper topics β€” planning and orchestration (Module 10), debugging strategies (Module 10, Lesson 4), and deployment (Module 11) β€” that will make your tool-building faster and more reliable. You can also return to the Biotech Track to revisit any lesson.
  • Build something new. You have the pattern: describe the data format, specify the visualization, list the features, paste the prompt. The next tool you build will be one that does not exist in any lesson β€” because it solves a problem unique to your work.
πŸ’¬You are the expert

The AI handles the JavaScript. You handle the biology. That combination β€” domain expertise plus AI-augmented tooling β€” is what makes these tools useful rather than just technically impressive. Keep building.