Mass Spectrometry Data Viewer

What you’re building

Mass spectrometry is the backbone of proteomics in modern biology and chemistry labs. Whether you are running LC-MS/MS on a Q Exactive, doing MALDI-TOF for protein ID, or analyzing metabolites on a triple quad, you eventually need to look at spectra. Vendor software is powerful but locked to specific instruments and operating systems. Open-source viewers exist but often require complex installations.

In this lesson you will build an interactive mass spectrometry data viewer as a Python + Plotly web application. Upload CSV-formatted spectral data, visualize it with interactive zoom and pan, run peak detection, overlay multiple spectra, and export publication-quality annotated figures. It runs in any browser, works with many instruments if data can be exported as two-column numeric CSV (m/z, intensity) — though some vendor exports may require cleanup or conversion first — and takes about 20 minutes to build.

This is the most advanced tool in the module. It combines file I/O, signal processing (peak detection), interactive visualization, and multi-file comparison. The LLM handles the implementation. You handle the science.

ℹSoftware pattern: Signal processing and visualization

Load signal data → detect features → annotate → compare. This pattern works for audio waveforms, financial time series, IoT sensor data — any domain with signal-like data.

💡Running on HPC?

If you are working on a remote server or HPC cluster, use a conda environment instead of venv for easier dependency management. For the Dash web interface, use SSH port forwarding (ssh -L 8050:localhost:8050 user@server) to view the dashboard in your local browser.

The showcase

The finished application will provide:

File upload panel: drag-and-drop or click to upload one or more CSV files. Each file should have two columns: m/z (or retention time) and intensity.
Interactive spectrum plot: Plotly chart with full zoom, pan, hover tooltips showing exact m/z and intensity at every point.
Peak detection: automatic peak finding using a local maxima algorithm with configurable parameters (minimum height, minimum distance between peaks, prominence threshold).
Peak annotation: detected peaks labeled on the plot with m/z values. Click a peak to see detailed info.
Multi-spectrum overlay: load multiple files and overlay them with different colors. Toggle individual spectra on/off. Normalize intensities for comparison.
Difference view: subtract one spectrum from another to highlight changes between conditions (e.g., treated vs. control).
Export: download the current view as a PNG or SVG image with annotations, suitable for publication figures.
Peak list export: download detected peaks as a CSV with m/z, intensity, and resolution.

The prompt

Open your AI CLI tool (such as Claude Code, Gemini CLI, or your preferred tool) in an empty directory and paste:

Create a Python web application for mass spectrometry data visualization using
Dash (by Plotly) and Python. Call it mass-spec-viewer.

PROJECT STRUCTURE:
mass-spec-viewer/
├── app.py                    # main Dash application
├── peak_detection.py         # peak finding algorithms
├── data_processing.py        # CSV parsing, normalization, smoothing
├── export_utils.py           # figure export helpers
├── assets/
│   └── style.css             # dark theme CSS
├── sample_data/
│   ├── sample_spectrum_1.csv # example MS1 spectrum (m/z, intensity)
│   └── sample_spectrum_2.csv # second example for overlay demo
├── requirements.txt          # dash, plotly, pandas, scipy, numpy, kaleido
└── README.md

SAMPLE DATA:
Generate two realistic sample CSV files:
- sample_spectrum_1.csv: simulate a protein digest MS1 spectrum, m/z range
  400-2000, with ~20 distinct peaks at realistic m/z values for tryptic peptides
  (e.g., doubly and triply charged peptides in the 400-1200 range), Gaussian peak
  shapes with realistic widths, baseline noise
- sample_spectrum_2.csv: similar spectrum but with 3 peaks shifted (simulating
  a post-translational modification like phosphorylation: +80 Da on some peptides)
  and 2 peaks absent (simulating a protein that is downregulated)

APP LAYOUT (app.py):
Use Dash with a dark theme. Layout sections:

1. UPLOAD SECTION (top)
   - Drag-and-drop area accepting .csv files
   - Support multiple file upload
   - Show list of loaded files with color swatches and remove buttons
   - "Load Sample Data" button

2. CONTROL PANEL (left sidebar, 250px wide)
   Peak Detection Controls:
   - Algorithm selector: "Local Maxima" or "Continuous Wavelet Transform"
   - Min peak height: slider (0 to max_intensity, default 5% of max)
   - Min peak distance: numeric input in m/z units (default 0.5)
   - Prominence threshold: slider (0 to max_intensity/2, default 2% of max)
   - "Detect Peaks" button
   - Show/hide peak labels checkbox

   Display Controls:
   - Normalize intensities toggle (to 100% base peak)
   - Smoothing: none / Savitzky-Golay / moving average, with window size
   - Y-axis: linear / log scale
   - X-axis range: min and max m/z inputs
   - Show baseline checkbox

   Overlay Controls:
   - Checkboxes to show/hide each loaded spectrum
   - "Mirror Plot" toggle (shows second spectrum inverted below x-axis,
     common for library matching)
   - "Difference" toggle (subtract spectrum 2 from spectrum 1)
   - Offset slider for stacked view (vertical offset between spectra)

3. MAIN PLOT (center, fills remaining width)
   - Plotly scatter/line chart with m/z on x-axis, intensity on y-axis
   - Interactive: zoom (scroll), pan (drag), box zoom, reset
   - Hover tooltip: m/z (4 decimal places), intensity (scientific notation),
     file name
   - Peak annotations: vertical line from peak to label, m/z value as text
   - Color-coded by file (auto-assign from a qualitative palette)
   - Responsive, fills available space

4. PEAK TABLE (below plot)
   - DataTable with columns: Peak #, m/z, Intensity, Relative Intensity (%),
     Resolution (m/z / FWHM), S/N Ratio, File
   - Sortable and filterable
   - Click a row to zoom the plot to that peak (+/- 5 m/z units)
   - "Download Peak List" button (CSV export)

5. EXPORT PANEL (bottom)
   - "Export PNG" button (high-res, 300 DPI, white background option for papers)
   - "Export SVG" button (vector, editable in Illustrator/Inkscape)
   - "Export Interactive HTML" button (saves standalone Plotly HTML file)
   - Width/height inputs for export dimensions
   - Option: include or exclude annotations in export

PEAK DETECTION (peak_detection.py):
- Local maxima method: use scipy.signal.find_peaks with height, distance,
  and prominence parameters
- CWT method: use scipy.signal.find_peaks_cwt with configurable widths
- For each detected peak, calculate:
  - Exact m/z (parabolic interpolation around the discrete maximum)
  - FWHM (full width at half maximum) by finding half-height crossings
  - Resolution: m/z / FWHM
  - Signal-to-noise: peak height / local baseline noise (estimated from
    median of surrounding 50 points)

DATA PROCESSING (data_processing.py):
- CSV parser: auto-detect delimiter (comma, tab, space), skip comment lines
  starting with #, handle different column names (m/z, mz, mass, Mass/Charge
  for x-axis; intensity, Intensity, int, abundance for y-axis)
- Normalization: scale to base peak (100%) or TIC (total ion current)
- Smoothing: Savitzky-Golay filter (scipy.signal.savgol_filter) with
  configurable window and polynomial order
- Baseline estimation: rolling minimum with large window
- Interpolation for difference spectra: align two spectra to common m/z grid

DESIGN:
- Dark theme: #0a0a0f background, #1a1a2e panels, #e0e0e0 text, #00d4ff accent
- Plotly dark template for charts
- Clean, professional layout suitable for a core facility

Generate all files with complete implementations. Include the sample data CSV files.
The app should work end-to-end: python app.py opens a browser with the viewer ready.

⚠Dependencies

This tool uses scipy for peak detection, pandas for data handling, and Dash for the web application. You need to install these via pip. If you cannot install scipy and pandas, ask the LLM to replace them with pure Python equivalents. However, Dash itself is required for the web interface — if you cannot install any Python packages at all, ask the LLM for a static HTML + Plotly.js CDN version that uses pure Python for CSV preprocessing.

What you get

After generation, set up the project:

cd mass-spec-viewer
python -m venv .venv
source .venv/bin/activate    # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
python app.py

Start the server and open http://localhost:8050 in your browser manually (the server may or may not auto-open the browser depending on the generated code).

Expected project structure

mass-spec-viewer/
├── app.py                      (~400-500 lines)
├── peak_detection.py           (~100-150 lines)
├── data_processing.py          (~150-200 lines)
├── export_utils.py             (~80-100 lines)
├── assets/
│   └── style.css
├── sample_data/
│   ├── sample_spectrum_1.csv
│   └── sample_spectrum_2.csv
├── requirements.txt
└── README.md

First run walkthrough

Click Load Sample Data. Two spectra appear on the plot in different colors.
Click Detect Peaks with default settings. Peak labels appear at the major peaks.
Toggle Normalize to see both spectra on the same scale.
Toggle Mirror Plot to see spectrum 2 inverted below the x-axis — this is the standard view for comparing an experimental spectrum to a library spectrum.
Toggle Difference to see what changed between the two samples (Spectrum 1 minus Spectrum 2). You should see:
- Positive peaks where spectrum 1 has unique or stronger signals.
- Negative peaks where spectrum 2 has unique or stronger signals.
- For the phosphorylation-shifted peaks: a positive peak at the original m/z (present in spectrum 1, absent in spectrum 2) and a negative peak at m/z + 80 (absent in spectrum 1, present in spectrum 2).
Zoom into the 500-700 m/z range by scroll-zooming or box-selecting on the plot.
Click a row in the peak table to zoom to that peak.
Click Export PNG with “white background” checked to get a publication-ready figure.

Common issues and fixes

Problem	Follow-up prompt
CSV upload fails	`The CSV parser is failing. Add more robust column detection: try reading the first 5 lines to detect the header, handle both comma and tab delimiters, and skip any lines that don't parse as numbers.`
Peak detection finds too many peaks	`Peak detection is finding noise peaks. Increase the default minimum height to 10% of max intensity and minimum prominence to 5% of max. Also add a smoothing pass before peak detection.`
Mirror plot not rendering	`The mirror plot should show spectrum 2 with negated intensities on the same axes. Make sure the y-axis range is symmetric around 0 when mirror mode is on.`
Export PNG is low resolution	`The PNG export is blurry. Use plotly.io.write_image with scale=3 for 300 DPI equivalent. Make sure kaleido is installed (add it to requirements.txt).`

Worked example: Comparing treated vs. control samples

Here is a practical scenario for a proteomics researcher studying phosphorylation.

Step 1. You have two LC-MS/MS runs: a control sample and a sample treated with a kinase inhibitor. Export the MS1 scans at the retention time of your peptide of interest from Xcalibur (File > Export > Spectrum List as CSV).

Step 2. Load both CSV files into the viewer. Each should show the expected peptide peaks in the 400-1200 m/z range.

Step 3. Turn on Normalize so both spectra are on the same scale (base peak = 100%).

Step 4. Turn on Difference view. Look for:

Positive peaks (stronger in control than treated): note the subtraction direction carefully. The sign depends on whether the difference is computed as control minus treated or treated minus control. Define your subtraction order before interpreting results.
Negative peaks (stronger in treated than control): the opposite direction in the subtraction.
Shifted peaks (+80 Da shift): phosphopeptides appear at m/z + 79.97 Da (monoisotopic mass of HPO3) for singly charged ions, or +79.97/z for multiply charged ions, relative to the unmodified form.

Step 5. Detect peaks on the difference spectrum. The peak list now shows only the m/z values that changed between conditions — a quick way to identify candidate phosphopeptides without running a full database search.

ℹGetting CSV data from vendor software

Every instrument vendor has a different export process:

Thermo Xcalibur: Open raw file > click on spectrum > File > Export > Spectrum List (CSV)
Bruker FlexAnalysis (MALDI): File > Export > ASCII (tab-delimited, rename to .csv)
Waters MassLynx: Right-click spectrum > Export > Combine to ASCII
Agilent MassHunter: File > Export > CSV

If your vendor software does not export CSV directly, export to mzML (open format) first using MSConvert from ProteoWizard, then ask the AI to add mzML support to the viewer.

Worked example: MALDI-TOF protein identification

For MALDI-TOF peptide mass fingerprinting, the workflow is slightly different:

Step 1. After running your MALDI-TOF, export the peak list or full spectrum from FlexAnalysis as a tab-delimited text file.

Step 2. Load it into the viewer. The m/z range is typically 800-4000 for tryptic digests.

Step 3. Run peak detection with these adjusted settings:

Min peak height: 5% of maximum (MALDI spectra tend to have higher baseline noise)
Min peak distance: 1.0 m/z (MALDI resolution is lower than ESI)
Prominence: 3% of maximum

Step 4. Export the peak list as CSV. You now have a list of m/z values that you can paste directly into the Mascot PMF search (matrixscience.com) or other identification tools such as ProteinProspector or PEAKS.

The peak detection parameters I need for MALDI-TOF data are different from ESI.
Add a "Preset" dropdown with options: "ESI (default)", "MALDI-TOF", "MALDI-TOF/TOF".
The MALDI-TOF preset should set min height to 5%, distance to 1.0, and prominence
to 3%. MALDI-TOF/TOF preset should use distance 0.3 and prominence 2%.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

CSV upload shows 'No valid data columns found' error

Evidence

I exported a spectrum from Xcalibur and the CSV file has columns labeled 'Mass/Charge' and 'Counts'. The upload gives: 'Error: No valid data columns found in file.'

What to ask the AI

"The CSV parser does not recognize my column names. My file uses 'Mass/Charge' for the m/z column and 'Counts' for intensity. Can you add these as recognized column names in data_processing.py? Also add 'Abundance', 'Signal', and 'Rel. Intensity' as aliases for the intensity column."

Symptom

Peak detection finds hundreds of noise peaks even with high threshold

Evidence

I loaded a MALDI spectrum with visible baseline noise. Even with min height at 20% of max, peak detection reports 347 peaks. Most are clearly noise between m/z 1000-1200.

What to ask the AI

"The baseline noise is being counted as peaks because the baseline is not flat -- it has a broad hump. Can you add baseline subtraction before peak detection? Estimate the baseline using a rolling minimum with a large window (e.g., 200 points), subtract it, then run peak finding on the baseline-corrected spectrum."

Symptom

Difference spectrum shows artifacts at the edges of peaks

Evidence

The difference plot shows small positive/negative spikes at the edges of every peak, even peaks that should be identical between the two spectra. The centers of shared peaks correctly show zero difference.

What to ask the AI

"The two spectra have slightly different m/z sampling points, so subtraction produces artifacts at peak edges due to interpolation. Can you improve the interpolation for difference spectra? Use cubic spline interpolation instead of linear, and resample both spectra to a common m/z grid with 0.01 Da spacing before subtracting."

Symptom

SVG export produces a file that looks wrong in Inkscape

Evidence

The SVG file opens in Inkscape but the text labels are positioned incorrectly and the dark background is missing. It looks like the annotations shifted to the upper-left corner.

What to ask the AI

"The SVG export from Plotly sometimes has positioning issues in external editors. Can you add an option for 'publication-ready SVG' that: (1) uses a white background instead of dark, (2) converts all text to paths so fonts render correctly, and (3) sets explicit viewBox dimensions? Use plotly.io.write_image with format='svg' and add the template='plotly_white' option."

Symptom

App crashes with MemoryError when loading a large CSV file

Evidence

I exported a full LC-MS run as CSV (1.2 million data points, 45 MB file). The app crashes: MemoryError in pandas.read_csv. Smaller files work fine.

What to ask the AI

"The app is trying to load the entire file at once. Can you add chunked loading for large files? If the file has more than 100,000 rows, downsample it by keeping every Nth point to stay under 100K points for display. Show a warning that the data has been downsampled and offer a 'Full Resolution' toggle that loads the complete data for a selected m/z range."

Understanding peak detection

The peak detection algorithm is the core of this tool. Here is what it does:

Local maxima method (default): A point is a peak if it is higher than its neighbors by at least the prominence threshold and is separated from other peaks by at least distance m/z units. The height threshold filters out low-intensity noise. This is implemented by scipy.signal.find_peaks, which is the workhorse of peak detection in Python.

Parabolic interpolation: The discrete maximum from the raw data is approximate. Fitting a parabola through the peak apex and its two neighbors can improve sub-sample peak localization when signal-to-noise and sampling are sufficient. This is the same technique used in vendor software.

FWHM and resolution: Full Width at Half Maximum measures peak sharpness. Resolution (m/z divided by FWHM) quantifies the instrument’s ability to separate nearby peaks. A Q Exactive at resolution 70,000 at m/z 200 has FWHM of about 0.003 Da. Your calculated resolution should be in the right ballpark for your instrument.

🔍For Researchers: When this tool replaces vendor software (and when it doesn't)

Use this tool for:

Quick visualization of exported spectra without opening vendor software
Comparing spectra from different instruments or experiments (vendor software usually cannot overlay spectra from different platforms)
Generating publication figures with consistent styling across your paper
Teaching mass spectrometry concepts to students (the interactive controls make it a great demo tool)
Initial peak picking before running identification through a search engine

Do not use this tool for:

Raw file reading (.raw, .d, .wiff) — you still need vendor software or pyteomics/ms-deisotope to convert to CSV/mzML first
Deconvolution of isotope envelopes (charge state determination) — use tools like UniDec or Xtract
Database searching for peptide identification — use MaxQuant, MSFragger, or Proteome Discoverer
Quantitative analysis (TMT/iTRAQ, label-free quant) — use specialized tools

The sweet spot is visualization, comparison, and figure generation.

Customize it

Add mzML support

Add support for reading mzML files in addition to CSV. Use the pyteomics library
(add to requirements.txt). When an mzML file is uploaded, let the user select
which scan to display from a dropdown. Show scan metadata (retention time, MS
level, precursor m/z for MS2 scans) in an info panel. This lets users work
directly with open-format mass spec data.

Add isotope pattern overlay

Add an isotope pattern calculator. The user enters a molecular formula
(e.g., C254H377N65O75S6 for insulin) and a charge state. Calculate the
theoretical isotope pattern using the averagine model and overlay it on the
experimental spectrum at the specified m/z. This helps with charge state
confirmation and peak assignment. Use a simple binomial approximation for
the isotope distribution.

Add spectral library matching

Add a "Library Match" tab. The user uploads a library CSV file (m/z and intensity
columns, one spectrum per file or multiple spectra separated by blank lines with
ID headers). When the user selects an experimental spectrum and clicks "Match",
compute the cosine similarity score against every library entry and display the
top 10 matches in a table. Clicking a match shows the mirror plot comparison.
This is the basic workflow for spectral library searching in metabolomics and
lipidomics.

Add batch peak comparison

Add a batch comparison mode. Upload 10+ spectra and the tool creates a presence/
absence matrix: rows are detected peaks (binned to 0.01 Da), columns are samples.
Display as a heatmap with hierarchical clustering (use scipy). This gives a quick
overview of which peaks are shared across samples and which are unique -- similar
to what you'd see in MetaboAnalyst but directly in your browser. Export the matrix
as a CSV for downstream stats in R.

Add retention time viewer for LC-MS

Add a chromatogram mode for LC-MS data. Accept a CSV with three columns: retention
time, m/z, intensity. Display a total ion chromatogram (TIC) as a line plot.
Clicking on any point in the TIC shows the mass spectrum at that retention time.
Add extracted ion chromatogram (XIC) functionality: enter a target m/z and tolerance,
and show the intensity over time for just that ion. This turns the tool into a basic
LC-MS browser.

💡A practical workflow for publication figures

Many journals require vector figures. Here is the workflow:

Load your spectrum, detect peaks, zoom to the region of interest.
Turn off unnecessary annotations to keep the figure clean.
Export as SVG.
Open the SVG in Inkscape (free) or Adobe Illustrator.
Adjust fonts, line weights, and annotation positions to match journal style.
Export as PDF or EPS for submission.

This gives you publication-quality figures from any instrument’s data in about 5 minutes.

Connecting to Core Facility Workflows

This tool is directly relevant to common core facility services:

Mass Spectrometry — Cores offering protein identification, targeted and untargeted proteomics, lipidomics, and small molecule quantitation via LC/MS/MS generate data you can visualize here. After running samples on an Orbitrap or Q Exactive, export spectra as CSV from Xcalibur and load them into this viewer. Useful for checking calibrant peaks, comparing digestion conditions, or making figures for lab meeting. MALDI spectra can be exported from FlexAnalysis as CSV for side-by-side comparisons across experiments without being tied to a vendor workstation.

Isotope Ratio Analysis — Stable isotope labs measure isotopes of hydrogen, carbon, nitrogen, and oxygen across varied sample types. This connects directly to provenance analysis — determining the geographic origin of biological samples based on isotope signatures.

Metabolomics — Small molecule identification often involves comparing experimental spectra to standards. The spectral library matching customization makes this tool directly useful for metabolomics confirmations.

If you are taking a course covering mass spectrometry data analysis, this viewer gives you an interactive tool to explore the spectra you encounter in coursework — and to build publication figures from your own data.

🔍Extension: Forensic isotope analysis for provenance

Stable isotope analysis is used in forensic identification projects. When skeletal remains are recovered from historical sites, isotope ratios in bone and tooth enamel help determine where a person grew up. The ratios of carbon-13 to carbon-12 (delta-13C) and nitrogen-15 to nitrogen-14 (delta-15N) reflect diet, while oxygen-18 to oxygen-16 (delta-18O) reflects drinking water sources — which vary by geography.

This means isotope data can narrow identification to a geographic region and dietary pattern — a powerful complement to DNA analysis when DNA is too degraded to yield results.

Here is a prompt to build an isotope ratio visualization tool for this kind of work:

Create a Python + Plotly web application called isotope-viewer for stable isotope
provenance analysis. Requirements:

1. Upload CSV files with columns: sample_id, delta_13C, delta_15N, delta_18O,
   sample_type (bone/tooth/hair), and optional notes
2. Main scatter plot: delta_13C (x-axis) vs delta_15N (y-axis) with point size
   mapped to delta_18O. Color-code by sample_type.
3. Overlay geographic reference zones as shaded rectangles on the plot:
   - Upper Midwest USA: delta_13C -18 to -14, delta_15N 8 to 12
   - Southeast USA: delta_13C -20 to -16, delta_15N 6 to 10
   - Western Europe: delta_13C -22 to -19, delta_15N 9 to 13
   - Pacific Islands: delta_13C -16 to -12, delta_15N 10 to 15
   Label each zone. Make zones toggleable.
4. Click a point to see sample details. Highlight which reference zones the
   sample falls within.
5. Add a "Compare to Reference" panel where the user can paste known isotope
   values for a candidate individual and see where they plot relative to the
   evidence sample.
6. Export the plot as a publication-quality PNG with white background.
7. Dark theme matching the other tools in this module.

This tool would give forensic teams an interactive way to visualize isotope data from recovered remains alongside geographic reference datasets — something that currently requires manual plotting in Excel or R.

Key takeaways

Vendor-neutral visualization is a superpower: the ability to overlay spectra from different instruments (Thermo, Bruker, Waters, Agilent) in one viewer is something vendor software cannot do. This tool makes cross-platform comparison trivial.
Peak detection parameters matter: the same algorithm with different threshold settings can find 5 peaks or 500. Understanding what min height, distance, and prominence do lets you tune for your specific instrument and sample type.
Baseline subtraction is often helpful: real mass spectra have non-flat baselines from chemical noise. For noisy spectra, baseline correction before peak detection reduces false peaks. Validate visually and compare results with and without correction, as poorly tuned baseline subtraction can distort certain analyses.
Export quality determines publication readiness: 300 DPI PNG for raster figures, SVG for vector figures. Always include the kaleido package for Plotly image export.
This tool handles visualization and comparison, not identification or quantitation. Know what to use it for and what to hand off to dedicated tools like MaxQuant, MSFragger, or MetaboAnalyst.

Portfolio suggestion

The mass spec viewer is the most impressive demo in this module because it has visible, interactive output. For your portfolio:

Save the project and deploy it (even locally) so you can demo it live.
Include screenshots of the mirror plot view and the difference spectrum — these are visually striking and immediately understandable to anyone in mass spec.
If you have real data, load a spectrum from your instrument and include an annotated figure in your portfolio. A side-by-side comparison of “vendor software figure” vs. “my custom viewer figure” demonstrates the value of the tool.
Write a one-paragraph description of how this tool fits into your lab’s workflow. For example: “Our lab runs 50 samples per week on the Q Exactive. This viewer lets us quickly compare spectra across batches without opening Xcalibur, saving approximately 2 hours per week in QC visualization.”

🔍Advanced: Connecting to open-access spectral databases

You can extend the viewer to search public spectral databases directly:

Add a "Search MassBank" button. When clicked, take the currently selected peak
m/z value and query the MassBank REST API (https://massbank.eu/MassBank/API/)
for matching spectra within 0.01 Da tolerance. Display the top 5 matches with
compound name, molecular formula, and a cosine similarity score. Clicking a
match loads the reference spectrum as an overlay for visual comparison.

This turns your viewer into a basic compound identification tool for metabolomics. MassBank is a free, open-access database with over 90,000 reference spectra. For more comprehensive searches, you can export your peak list and use it with GNPS (Global Natural Products Social Molecular Networking) or METLIN.

For proteomics, a similar extension can search the PRIDE/ProteomeXchange spectral libraries or the NIST Peptide Mass Spectral Library.

The story so far

Across four lessons, you have built:

A sequence analysis dashboard (single HTML file, zero dependencies)
A CRISPR guide RNA designer (React + Vite app with scoring and visualization)
A genomics QC pipeline (Python CLI tool with config, logging, and reports)
A mass spec data viewer (Dash web app with peak detection and multi-spectrum overlay)

Each tool followed the same pattern: see what is possible, get the exact prompt, run it, customize it. Next up: RNA-seq differential expression analysis and a reproducible workflow orchestrator that ties it all together.

KNOWLEDGE CHECK

You load two spectra into the viewer and enable the difference view. You see a negative peak at m/z 723.4 and a positive peak at m/z 803.4 (a shift of +80 m/z, consistent with +80 Da on a singly charged ion). What is the most likely biological explanation?

Try it yourself

Generate the mass spec viewer with the prompt above.
Load the sample data and verify the overlay works.
Run peak detection and examine the peak table. Do the m/z values and S/N ratios look reasonable?
Try the mirror plot view. Can you visually identify the +80 Da shifted peaks (phosphorylation)?
Export a publication-quality PNG with white background.
If you have real mass spec data, export a spectrum from your vendor software as CSV and load it into the viewer.
Pick one customization and add it. The isotope pattern overlay is particularly useful for proteomics work.

Mass Spectrometry Data Viewer

What you'll learn

What you’re building

The showcase

The prompt

What you get

Expected project structure

First run walkthrough

Common issues and fixes

Worked example: Comparing treated vs. control samples

Worked example: MALDI-TOF protein identification

When Things Go Wrong

Understanding peak detection

Customize it

Add mzML support

Add isotope pattern overlay

Add spectral library matching

Add batch peak comparison

Add retention time viewer for LC-MS

Connecting to Core Facility Workflows

Key takeaways

Portfolio suggestion

The story so far

Try it yourself