Mass Spectrometry Data Viewer
What you'll learn
~30 min- Build an interactive mass spectrometry data viewer with Dash and Plotly
- Understand peak detection algorithms, FWHM calculation, and signal-to-noise estimation
- Troubleshoot common issues with CSV parsing, peak detection sensitivity, and figure export
- Extend the viewer with mzML support, isotope pattern overlays, and spectral library matching
What you’re building
Mass spectrometry is the backbone of proteomics in modern biology and chemistry labs. Whether you are running LC-MS/MS on a Q Exactive, doing MALDI-TOF for protein ID, or analyzing metabolites on a triple quad, you eventually need to look at spectra. Vendor software is powerful but locked to specific instruments and operating systems. Open-source viewers exist but often require complex installations.
In this lesson you will build an interactive mass spectrometry data viewer as a Python + Plotly web application. Upload CSV-formatted spectral data, visualize it with interactive zoom and pan, run peak detection, overlay multiple spectra, and export publication-quality annotated figures. It runs in any browser, works with many instruments if data can be exported as two-column numeric CSV (m/z, intensity) — though some vendor exports may require cleanup or conversion first — and takes about 20 minutes to build.
This is the most advanced tool in the module. It combines file I/O, signal processing (peak detection), interactive visualization, and multi-file comparison. The LLM handles the implementation. You handle the science.
Load signal data → detect features → annotate → compare. This pattern works for audio waveforms, financial time series, IoT sensor data — any domain with signal-like data.
If you are working on a remote server or HPC cluster, use a conda environment instead of venv for easier dependency management. For the Dash web interface, use SSH port forwarding (ssh -L 8050:localhost:8050 user@server) to view the dashboard in your local browser.
The showcase
The finished application will provide:
- File upload panel: drag-and-drop or click to upload one or more CSV files. Each file should have two columns: m/z (or retention time) and intensity.
- Interactive spectrum plot: Plotly chart with full zoom, pan, hover tooltips showing exact m/z and intensity at every point.
- Peak detection: automatic peak finding using a local maxima algorithm with configurable parameters (minimum height, minimum distance between peaks, prominence threshold).
- Peak annotation: detected peaks labeled on the plot with m/z values. Click a peak to see detailed info.
- Multi-spectrum overlay: load multiple files and overlay them with different colors. Toggle individual spectra on/off. Normalize intensities for comparison.
- Difference view: subtract one spectrum from another to highlight changes between conditions (e.g., treated vs. control).
- Export: download the current view as a PNG or SVG image with annotations, suitable for publication figures.
- Peak list export: download detected peaks as a CSV with m/z, intensity, and resolution.
The prompt
Open your AI CLI tool (such as Claude Code, Gemini CLI, or your preferred tool) in an empty directory and paste:
Create a Python web application for mass spectrometry data visualization usingDash (by Plotly) and Python. Call it mass-spec-viewer.
PROJECT STRUCTURE:mass-spec-viewer/├── app.py # main Dash application├── peak_detection.py # peak finding algorithms├── data_processing.py # CSV parsing, normalization, smoothing├── export_utils.py # figure export helpers├── assets/│ └── style.css # dark theme CSS├── sample_data/│ ├── sample_spectrum_1.csv # example MS1 spectrum (m/z, intensity)│ └── sample_spectrum_2.csv # second example for overlay demo├── requirements.txt # dash, plotly, pandas, scipy, numpy, kaleido└── README.md
SAMPLE DATA:Generate two realistic sample CSV files:- sample_spectrum_1.csv: simulate a protein digest MS1 spectrum, m/z range 400-2000, with ~20 distinct peaks at realistic m/z values for tryptic peptides (e.g., doubly and triply charged peptides in the 400-1200 range), Gaussian peak shapes with realistic widths, baseline noise- sample_spectrum_2.csv: similar spectrum but with 3 peaks shifted (simulating a post-translational modification like phosphorylation: +80 Da on some peptides) and 2 peaks absent (simulating a protein that is downregulated)
APP LAYOUT (app.py):Use Dash with a dark theme. Layout sections:
1. UPLOAD SECTION (top) - Drag-and-drop area accepting .csv files - Support multiple file upload - Show list of loaded files with color swatches and remove buttons - "Load Sample Data" button
2. CONTROL PANEL (left sidebar, 250px wide) Peak Detection Controls: - Algorithm selector: "Local Maxima" or "Continuous Wavelet Transform" - Min peak height: slider (0 to max_intensity, default 5% of max) - Min peak distance: numeric input in m/z units (default 0.5) - Prominence threshold: slider (0 to max_intensity/2, default 2% of max) - "Detect Peaks" button - Show/hide peak labels checkbox
Display Controls: - Normalize intensities toggle (to 100% base peak) - Smoothing: none / Savitzky-Golay / moving average, with window size - Y-axis: linear / log scale - X-axis range: min and max m/z inputs - Show baseline checkbox
Overlay Controls: - Checkboxes to show/hide each loaded spectrum - "Mirror Plot" toggle (shows second spectrum inverted below x-axis, common for library matching) - "Difference" toggle (subtract spectrum 2 from spectrum 1) - Offset slider for stacked view (vertical offset between spectra)
3. MAIN PLOT (center, fills remaining width) - Plotly scatter/line chart with m/z on x-axis, intensity on y-axis - Interactive: zoom (scroll), pan (drag), box zoom, reset - Hover tooltip: m/z (4 decimal places), intensity (scientific notation), file name - Peak annotations: vertical line from peak to label, m/z value as text - Color-coded by file (auto-assign from a qualitative palette) - Responsive, fills available space
4. PEAK TABLE (below plot) - DataTable with columns: Peak #, m/z, Intensity, Relative Intensity (%), Resolution (m/z / FWHM), S/N Ratio, File - Sortable and filterable - Click a row to zoom the plot to that peak (+/- 5 m/z units) - "Download Peak List" button (CSV export)
5. EXPORT PANEL (bottom) - "Export PNG" button (high-res, 300 DPI, white background option for papers) - "Export SVG" button (vector, editable in Illustrator/Inkscape) - "Export Interactive HTML" button (saves standalone Plotly HTML file) - Width/height inputs for export dimensions - Option: include or exclude annotations in export
PEAK DETECTION (peak_detection.py):- Local maxima method: use scipy.signal.find_peaks with height, distance, and prominence parameters- CWT method: use scipy.signal.find_peaks_cwt with configurable widths- For each detected peak, calculate: - Exact m/z (parabolic interpolation around the discrete maximum) - FWHM (full width at half maximum) by finding half-height crossings - Resolution: m/z / FWHM - Signal-to-noise: peak height / local baseline noise (estimated from median of surrounding 50 points)
DATA PROCESSING (data_processing.py):- CSV parser: auto-detect delimiter (comma, tab, space), skip comment lines starting with #, handle different column names (m/z, mz, mass, Mass/Charge for x-axis; intensity, Intensity, int, abundance for y-axis)- Normalization: scale to base peak (100%) or TIC (total ion current)- Smoothing: Savitzky-Golay filter (scipy.signal.savgol_filter) with configurable window and polynomial order- Baseline estimation: rolling minimum with large window- Interpolation for difference spectra: align two spectra to common m/z grid
DESIGN:- Dark theme: #0a0a0f background, #1a1a2e panels, #e0e0e0 text, #00d4ff accent- Plotly dark template for charts- Clean, professional layout suitable for a core facility
Generate all files with complete implementations. Include the sample data CSV files.The app should work end-to-end: python app.py opens a browser with the viewer ready.This tool uses scipy for peak detection, pandas for data handling, and Dash for the web application. You need to install these via pip. If you cannot install scipy and pandas, ask the LLM to replace them with pure Python equivalents. However, Dash itself is required for the web interface — if you cannot install any Python packages at all, ask the LLM for a static HTML + Plotly.js CDN version that uses pure Python for CSV preprocessing.
What you get
After generation, set up the project:
cd mass-spec-viewerpython -m venv .venvsource .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtpython app.pyStart the server and open http://localhost:8050 in your browser manually (the server may or may not auto-open the browser depending on the generated code).
Expected project structure
mass-spec-viewer/├── app.py (~400-500 lines)├── peak_detection.py (~100-150 lines)├── data_processing.py (~150-200 lines)├── export_utils.py (~80-100 lines)├── assets/│ └── style.css├── sample_data/│ ├── sample_spectrum_1.csv│ └── sample_spectrum_2.csv├── requirements.txt└── README.mdFirst run walkthrough
- Click Load Sample Data. Two spectra appear on the plot in different colors.
- Click Detect Peaks with default settings. Peak labels appear at the major peaks.
- Toggle Normalize to see both spectra on the same scale.
- Toggle Mirror Plot to see spectrum 2 inverted below the x-axis — this is the standard view for comparing an experimental spectrum to a library spectrum.
- Toggle Difference to see what changed between the two samples (Spectrum 1 minus Spectrum 2). You should see:
- Positive peaks where spectrum 1 has unique or stronger signals.
- Negative peaks where spectrum 2 has unique or stronger signals.
- For the phosphorylation-shifted peaks: a positive peak at the original m/z (present in spectrum 1, absent in spectrum 2) and a negative peak at m/z + 80 (absent in spectrum 1, present in spectrum 2).
- Zoom into the 500-700 m/z range by scroll-zooming or box-selecting on the plot.
- Click a row in the peak table to zoom to that peak.
- Click Export PNG with “white background” checked to get a publication-ready figure.
Common issues and fixes
| Problem | Follow-up prompt |
|---|---|
| CSV upload fails | The CSV parser is failing. Add more robust column detection: try reading the first 5 lines to detect the header, handle both comma and tab delimiters, and skip any lines that don't parse as numbers. |
| Peak detection finds too many peaks | Peak detection is finding noise peaks. Increase the default minimum height to 10% of max intensity and minimum prominence to 5% of max. Also add a smoothing pass before peak detection. |
| Mirror plot not rendering | The mirror plot should show spectrum 2 with negated intensities on the same axes. Make sure the y-axis range is symmetric around 0 when mirror mode is on. |
| Export PNG is low resolution | The PNG export is blurry. Use plotly.io.write_image with scale=3 for 300 DPI equivalent. Make sure kaleido is installed (add it to requirements.txt). |
Worked example: Comparing treated vs. control samples
Here is a practical scenario for a proteomics researcher studying phosphorylation.
Step 1. You have two LC-MS/MS runs: a control sample and a sample treated with a kinase inhibitor. Export the MS1 scans at the retention time of your peptide of interest from Xcalibur (File > Export > Spectrum List as CSV).
Step 2. Load both CSV files into the viewer. Each should show the expected peptide peaks in the 400-1200 m/z range.
Step 3. Turn on Normalize so both spectra are on the same scale (base peak = 100%).
Step 4. Turn on Difference view. Look for:
- Positive peaks (stronger in control than treated): note the subtraction direction carefully. The sign depends on whether the difference is computed as control minus treated or treated minus control. Define your subtraction order before interpreting results.
- Negative peaks (stronger in treated than control): the opposite direction in the subtraction.
- Shifted peaks (+80 Da shift): phosphopeptides appear at m/z + 79.97 Da (monoisotopic mass of HPO3) for singly charged ions, or +79.97/z for multiply charged ions, relative to the unmodified form.
Step 5. Detect peaks on the difference spectrum. The peak list now shows only the m/z values that changed between conditions — a quick way to identify candidate phosphopeptides without running a full database search.
Every instrument vendor has a different export process:
- Thermo Xcalibur: Open raw file > click on spectrum > File > Export > Spectrum List (CSV)
- Bruker FlexAnalysis (MALDI): File > Export > ASCII (tab-delimited, rename to .csv)
- Waters MassLynx: Right-click spectrum > Export > Combine to ASCII
- Agilent MassHunter: File > Export > CSV
If your vendor software does not export CSV directly, export to mzML (open format) first using MSConvert from ProteoWizard, then ask the AI to add mzML support to the viewer.
Worked example: MALDI-TOF protein identification
For MALDI-TOF peptide mass fingerprinting, the workflow is slightly different:
Step 1. After running your MALDI-TOF, export the peak list or full spectrum from FlexAnalysis as a tab-delimited text file.
Step 2. Load it into the viewer. The m/z range is typically 800-4000 for tryptic digests.
Step 3. Run peak detection with these adjusted settings:
- Min peak height: 5% of maximum (MALDI spectra tend to have higher baseline noise)
- Min peak distance: 1.0 m/z (MALDI resolution is lower than ESI)
- Prominence: 3% of maximum
Step 4. Export the peak list as CSV. You now have a list of m/z values that you can paste directly into the Mascot PMF search (matrixscience.com) or other identification tools such as ProteinProspector or PEAKS.
The peak detection parameters I need for MALDI-TOF data are different from ESI.Add a "Preset" dropdown with options: "ESI (default)", "MALDI-TOF", "MALDI-TOF/TOF".The MALDI-TOF preset should set min height to 5%, distance to 1.0, and prominenceto 3%. MALDI-TOF/TOF preset should use distance 0.3 and prominence 2%.When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
Understanding peak detection
The peak detection algorithm is the core of this tool. Here is what it does:
Local maxima method (default): A point is a peak if it is higher than its neighbors by at least the prominence threshold and is separated from other peaks by at least distance m/z units. The height threshold filters out low-intensity noise. This is implemented by scipy.signal.find_peaks, which is the workhorse of peak detection in Python.
Parabolic interpolation: The discrete maximum from the raw data is approximate. Fitting a parabola through the peak apex and its two neighbors can improve sub-sample peak localization when signal-to-noise and sampling are sufficient. This is the same technique used in vendor software.
FWHM and resolution: Full Width at Half Maximum measures peak sharpness. Resolution (m/z divided by FWHM) quantifies the instrument’s ability to separate nearby peaks. A Q Exactive at resolution 70,000 at m/z 200 has FWHM of about 0.003 Da. Your calculated resolution should be in the right ballpark for your instrument.
Use this tool for:
- Quick visualization of exported spectra without opening vendor software
- Comparing spectra from different instruments or experiments (vendor software usually cannot overlay spectra from different platforms)
- Generating publication figures with consistent styling across your paper
- Teaching mass spectrometry concepts to students (the interactive controls make it a great demo tool)
- Initial peak picking before running identification through a search engine
Do not use this tool for:
- Raw file reading (.raw, .d, .wiff) — you still need vendor software or pyteomics/ms-deisotope to convert to CSV/mzML first
- Deconvolution of isotope envelopes (charge state determination) — use tools like UniDec or Xtract
- Database searching for peptide identification — use MaxQuant, MSFragger, or Proteome Discoverer
- Quantitative analysis (TMT/iTRAQ, label-free quant) — use specialized tools
The sweet spot is visualization, comparison, and figure generation.
Customize it
Add mzML support
Add support for reading mzML files in addition to CSV. Use the pyteomics library(add to requirements.txt). When an mzML file is uploaded, let the user selectwhich scan to display from a dropdown. Show scan metadata (retention time, MSlevel, precursor m/z for MS2 scans) in an info panel. This lets users workdirectly with open-format mass spec data.Add isotope pattern overlay
Add an isotope pattern calculator. The user enters a molecular formula(e.g., C254H377N65O75S6 for insulin) and a charge state. Calculate thetheoretical isotope pattern using the averagine model and overlay it on theexperimental spectrum at the specified m/z. This helps with charge stateconfirmation and peak assignment. Use a simple binomial approximation forthe isotope distribution.Add spectral library matching
Add a "Library Match" tab. The user uploads a library CSV file (m/z and intensitycolumns, one spectrum per file or multiple spectra separated by blank lines withID headers). When the user selects an experimental spectrum and clicks "Match",compute the cosine similarity score against every library entry and display thetop 10 matches in a table. Clicking a match shows the mirror plot comparison.This is the basic workflow for spectral library searching in metabolomics andlipidomics.Add batch peak comparison
Add a batch comparison mode. Upload 10+ spectra and the tool creates a presence/absence matrix: rows are detected peaks (binned to 0.01 Da), columns are samples.Display as a heatmap with hierarchical clustering (use scipy). This gives a quickoverview of which peaks are shared across samples and which are unique -- similarto what you'd see in MetaboAnalyst but directly in your browser. Export the matrixas a CSV for downstream stats in R.Add retention time viewer for LC-MS
Add a chromatogram mode for LC-MS data. Accept a CSV with three columns: retentiontime, m/z, intensity. Display a total ion chromatogram (TIC) as a line plot.Clicking on any point in the TIC shows the mass spectrum at that retention time.Add extracted ion chromatogram (XIC) functionality: enter a target m/z and tolerance,and show the intensity over time for just that ion. This turns the tool into a basicLC-MS browser.Many journals require vector figures. Here is the workflow:
- Load your spectrum, detect peaks, zoom to the region of interest.
- Turn off unnecessary annotations to keep the figure clean.
- Export as SVG.
- Open the SVG in Inkscape (free) or Adobe Illustrator.
- Adjust fonts, line weights, and annotation positions to match journal style.
- Export as PDF or EPS for submission.
This gives you publication-quality figures from any instrument’s data in about 5 minutes.
Connecting to Core Facility Workflows
This tool is directly relevant to common core facility services:
Mass Spectrometry — Cores offering protein identification, targeted and untargeted proteomics, lipidomics, and small molecule quantitation via LC/MS/MS generate data you can visualize here. After running samples on an Orbitrap or Q Exactive, export spectra as CSV from Xcalibur and load them into this viewer. Useful for checking calibrant peaks, comparing digestion conditions, or making figures for lab meeting. MALDI spectra can be exported from FlexAnalysis as CSV for side-by-side comparisons across experiments without being tied to a vendor workstation.
Isotope Ratio Analysis — Stable isotope labs measure isotopes of hydrogen, carbon, nitrogen, and oxygen across varied sample types. This connects directly to provenance analysis — determining the geographic origin of biological samples based on isotope signatures.
Metabolomics — Small molecule identification often involves comparing experimental spectra to standards. The spectral library matching customization makes this tool directly useful for metabolomics confirmations.
If you are taking a course covering mass spectrometry data analysis, this viewer gives you an interactive tool to explore the spectra you encounter in coursework — and to build publication figures from your own data.
Stable isotope analysis is used in forensic identification projects. When skeletal remains are recovered from historical sites, isotope ratios in bone and tooth enamel help determine where a person grew up. The ratios of carbon-13 to carbon-12 (delta-13C) and nitrogen-15 to nitrogen-14 (delta-15N) reflect diet, while oxygen-18 to oxygen-16 (delta-18O) reflects drinking water sources — which vary by geography.
This means isotope data can narrow identification to a geographic region and dietary pattern — a powerful complement to DNA analysis when DNA is too degraded to yield results.
Here is a prompt to build an isotope ratio visualization tool for this kind of work:
Create a Python + Plotly web application called isotope-viewer for stable isotopeprovenance analysis. Requirements:
1. Upload CSV files with columns: sample_id, delta_13C, delta_15N, delta_18O, sample_type (bone/tooth/hair), and optional notes2. Main scatter plot: delta_13C (x-axis) vs delta_15N (y-axis) with point size mapped to delta_18O. Color-code by sample_type.3. Overlay geographic reference zones as shaded rectangles on the plot: - Upper Midwest USA: delta_13C -18 to -14, delta_15N 8 to 12 - Southeast USA: delta_13C -20 to -16, delta_15N 6 to 10 - Western Europe: delta_13C -22 to -19, delta_15N 9 to 13 - Pacific Islands: delta_13C -16 to -12, delta_15N 10 to 15 Label each zone. Make zones toggleable.4. Click a point to see sample details. Highlight which reference zones the sample falls within.5. Add a "Compare to Reference" panel where the user can paste known isotope values for a candidate individual and see where they plot relative to the evidence sample.6. Export the plot as a publication-quality PNG with white background.7. Dark theme matching the other tools in this module.This tool would give forensic teams an interactive way to visualize isotope data from recovered remains alongside geographic reference datasets — something that currently requires manual plotting in Excel or R.
Key takeaways
- Vendor-neutral visualization is a superpower: the ability to overlay spectra from different instruments (Thermo, Bruker, Waters, Agilent) in one viewer is something vendor software cannot do. This tool makes cross-platform comparison trivial.
- Peak detection parameters matter: the same algorithm with different threshold settings can find 5 peaks or 500. Understanding what min height, distance, and prominence do lets you tune for your specific instrument and sample type.
- Baseline subtraction is often helpful: real mass spectra have non-flat baselines from chemical noise. For noisy spectra, baseline correction before peak detection reduces false peaks. Validate visually and compare results with and without correction, as poorly tuned baseline subtraction can distort certain analyses.
- Export quality determines publication readiness: 300 DPI PNG for raster figures, SVG for vector figures. Always include the kaleido package for Plotly image export.
- This tool handles visualization and comparison, not identification or quantitation. Know what to use it for and what to hand off to dedicated tools like MaxQuant, MSFragger, or MetaboAnalyst.
Portfolio suggestion
The mass spec viewer is the most impressive demo in this module because it has visible, interactive output. For your portfolio:
- Save the project and deploy it (even locally) so you can demo it live.
- Include screenshots of the mirror plot view and the difference spectrum — these are visually striking and immediately understandable to anyone in mass spec.
- If you have real data, load a spectrum from your instrument and include an annotated figure in your portfolio. A side-by-side comparison of “vendor software figure” vs. “my custom viewer figure” demonstrates the value of the tool.
- Write a one-paragraph description of how this tool fits into your lab’s workflow. For example: “Our lab runs 50 samples per week on the Q Exactive. This viewer lets us quickly compare spectra across batches without opening Xcalibur, saving approximately 2 hours per week in QC visualization.”
🔍Advanced: Connecting to open-access spectral databases
You can extend the viewer to search public spectral databases directly:
Add a "Search MassBank" button. When clicked, take the currently selected peakm/z value and query the MassBank REST API (https://massbank.eu/MassBank/API/)for matching spectra within 0.01 Da tolerance. Display the top 5 matches withcompound name, molecular formula, and a cosine similarity score. Clicking amatch loads the reference spectrum as an overlay for visual comparison.This turns your viewer into a basic compound identification tool for metabolomics. MassBank is a free, open-access database with over 90,000 reference spectra. For more comprehensive searches, you can export your peak list and use it with GNPS (Global Natural Products Social Molecular Networking) or METLIN.
For proteomics, a similar extension can search the PRIDE/ProteomeXchange spectral libraries or the NIST Peptide Mass Spectral Library.
The story so far
Across four lessons, you have built:
- A sequence analysis dashboard (single HTML file, zero dependencies)
- A CRISPR guide RNA designer (React + Vite app with scoring and visualization)
- A genomics QC pipeline (Python CLI tool with config, logging, and reports)
- A mass spec data viewer (Dash web app with peak detection and multi-spectrum overlay)
Each tool followed the same pattern: see what is possible, get the exact prompt, run it, customize it. Next up: RNA-seq differential expression analysis and a reproducible workflow orchestrator that ties it all together.
You load two spectra into the viewer and enable the difference view. You see a negative peak at m/z 723.4 and a positive peak at m/z 803.4 (a shift of +80 m/z, consistent with +80 Da on a singly charged ion). What is the most likely biological explanation?
Try it yourself
- Generate the mass spec viewer with the prompt above.
- Load the sample data and verify the overlay works.
- Run peak detection and examine the peak table. Do the m/z values and S/N ratios look reasonable?
- Try the mirror plot view. Can you visually identify the +80 Da shifted peaks (phosphorylation)?
- Export a publication-quality PNG with white background.
- If you have real mass spec data, export a spectrum from your vendor software as CSV and load it into the viewer.
- Pick one customization and add it. The isotope pattern overlay is particularly useful for proteomics work.