Core Service Intake Validator
What you'll learn
~25 min- Build a drag-and-drop sample sheet validator with a single AI prompt
- Validate CSV submissions against configurable rules for required fields, billing codes, and ID formats
- Troubleshoot common issues with CSV parsing, drag-and-drop events, and validation logic
- Customize the validator with additional rules, export options, or integration with your facility's code list
What you’re building
Every core facility has the same bottleneck: a researcher emails a sample submission sheet, and someone on staff has to open it, scan every row for missing PI names, invalid billing codes, duplicate sample IDs, and dates in three different formats. It takes 10-15 minutes per sheet, and mistakes slip through anyway.
You are going to build a tool that does this check in under one second.
Core facility managers spend hours every week chasing down incomplete submissions. A validator that catches errors before samples reach the bench saves staff time, reduces re-runs, and keeps researchers happy because they fix problems once instead of getting an email three days later asking for corrections.
By the end of this lesson you will have a standalone sample sheet validator that runs entirely in the browser. Drag a CSV onto the page (or click to upload), and it instantly flags every row with a missing field, bad billing code, duplicate sample ID, or malformed date. No server, no database, no installation — just one HTML file you can bookmark on the intake workstation.
Upload → parse → validate against rules → display errors. This pattern works for any structured data intake: purchase orders, timesheets, expense reports, inventory logs. The techniques here transfer directly to non-lab contexts.
🔍Domain Primer: Key terms you'll see in this lesson
New to core facility operations? Here are the terms you’ll encounter:
- Sample sheet / Sample manifest — A spreadsheet (usually CSV or Excel) that researchers submit to a core facility listing every sample they want processed. Each row is one sample with metadata like species, quantity, and billing information.
- PI (Principal Investigator) — The lead researcher on a project. Every sample submission must be tied to a PI for billing and accountability.
- Billing code — An alphanumeric code that maps a service to a price. Core facilities use these to charge grants for instrument time and consumables. Example:
SEQ-001for standard Illumina sequencing. - Grant number — The funding source that pays for the work. Federal grants follow formats like
R01-GM123456. Every billable service must be tied to a valid grant. - Requisition form — The formal request document for core facility services. The sample sheet is often attached to or part of the requisition.
- Sample ID — A unique identifier for each sample, often following a facility-specific format like
CF-2026-0001. Duplicates cause tracking nightmares downstream.
You don’t need to memorize these — the tool handles the validation logic. You just need to know what the fields represent.
Who this is for
- Core facility managers who review incoming sample sheets daily and want to catch errors before processing begins.
- Lab coordinators who need a quick sanity check before forwarding submissions to the sequencing or proteomics queue.
- Researchers who want to self-validate their submission sheets before sending them to the core, avoiding the back-and-forth email cycle.
UW-Madison operates dozens of core facilities — from the Biotechnology Center’s DNA Sequencing Facility to the Mass Spectrometry Facility and the Genome Center. Each has its own submission format, but the validation problems are universal: missing fields, bad codes, duplicates. A configurable validator handles all of them.
The showcase
Here is what the finished validator looks like once you open the HTML file in a browser:
- Drag-and-drop zone at the top where you drop a CSV file (or click to browse). Visual feedback on dragover.
- Validation rules panel showing the active rules: required fields, valid billing codes, sample ID format, date format.
- Summary bar showing total samples, errors found, warnings, and clean rows.
- Color-coded error report with a table where:
- Clean rows have a green left border.
- Rows with errors have a red left border and the offending cells are highlighted.
- Rows with warnings (non-critical issues) have a yellow left border.
- Error detail panel listing every issue by row number, column, and a human-readable description.
- Export button that downloads a copy of the report as a printable HTML page.
Everything runs client-side. The CSV data never leaves the browser. You can use this on an air-gapped workstation.
The prompt
Open your terminal Terminal The app where you type commands. Mac: Cmd+Space, type "Terminal". Windows: open WSL (Ubuntu) from the Start menu.
Full lesson →
, navigate to a project folder project folder A directory on your computer where the tool lives. Create one with "mkdir my-project && cd my-project".
Full lesson →
, start your AI CLI tool AI CLI tool Claude Code, Gemini CLI, or Codex CLI — a command-line AI that reads files, writes code, and runs commands.
Full lesson →
(e.g., by typing claude), and paste this prompt:
Build a single self-contained HTML file called intake-validator.html that validatescore facility sample submission sheets. Requirements:
1. FILE INPUT - A drag-and-drop zone (dashed border, changes color on dragover) for CSV files - Also a click-to-browse fallback button - Parse the CSV client-side (handle quoted fields, commas inside quotes) - Show the filename and row count after upload
2. SAMPLE DATA (embed as a "Load Example" button) Include this sample CSV data with deliberate errors for testing: Sample_ID,PI_Name,Email,Grant_Number,Species,Sample_Type,Billing_Code,Date_Submitted,Quantity,Notes CF-2026-0001,Dr. Sarah Chen,chen.lab@wisc.edu,R01-GM134522,Mus musculus,gDNA,SEQ-001,2026-03-15,12,Rush processing requested CF-2026-0002,,johnson.k@wisc.edu,R01-GM134522,Homo sapiens,RNA,SEQ-002,2026-03-15,8, CF-2026-0003,Dr. James Rivera,rivera.j@wisc.edu,R01-HG009876,Drosophila melanogaster,Protein,PROT-001,2026/03/16,5,Need results by Friday CF-2026-0001,Dr. Sarah Chen,chen.lab@wisc.edu,R01-GM134522,Mus musculus,gDNA,SEQ-001,2026-03-15,12,Duplicate of row 1 CF-2026-0005,Dr. Anika Patel,patel.anika@wisc.edu,P30-CA014520,Arabidopsis thaliana,gDNA,INVALID-99,2026-03-17,3, CF-2026-0006,Dr. James Rivera,rivera.j@wisc.edu,,Danio rerio,Total RNA,SEQ-003,March 18 2026,20,New collaboration CF-2026-0007,Dr. Lisa Yamamoto,yamamoto.l@wisc.edu,U54-AI170856,Saccharomyces cerevisiae,Plasmid,SEQ-001,2026-03-18,0, WRONG_FORMAT,Dr. Marcus Brown,brown.marcus,R21-NS112340,Mus musculus,FFPE,HIST-001,2026-03-19,4,Paraffin blocks CF-2026-0009,Dr. Sarah Chen,chen.lab@wisc.edu,R01-GM134522,Caenorhabditis elegans,smRNA,SEQ-004,2026-03-19,15,Small RNA library prep CF-2026-0010,Dr. Emily Foster,foster.e@wisc.edu,T32-GM008349,Xenopus laevis,mRNA,,2026-03-20,6,Training grant samples CF-2026-0011,Dr. Raj Krishnan,krishnan.r@wisc.edu,R01-EB029234,Mus musculus,Crosslinked chromatin,SEQ-005,2026-03-20,24,Two conditions x 3 reps x 4 antibodies CF-2026-0012,Dr. Anika Patel,patel.anika@wisc.edu,P30-CA014520,Homo sapiens,Isolated nuclei,SEQ-006,20260321,10,Sorted cell populations CF-2026-0013,Dr. Lisa Yamamoto,yamamoto.l@wisc.edu,U54-AI170856,Saccharomyces cerevisiae,gDNA,SEQ-001,2026-03-21,2,Whole genome sequencing CF-2026-0014,Dr. Marcus Brown,mbrown@medicine.wisc.edu,R21-NS112340,Rattus norvegicus,cDNA,SEQ-002,2026-03-22,-3,Negative quantity CF-2026-0015,Dr. Emily Foster,,T32-GM008349,Drosophila melanogaster,Total RNA,PROT-002,2026-03-22,8,Wrong billing code for RNA
3. VALIDATION RULES (apply all of these) - Required fields: Sample_ID, PI_Name, Email, Grant_Number, Billing_Code, Date_Submitted - Email: must contain an @ sign and a valid domain format (e.g., user@wisc.edu) - Sample_ID format: must match regex /^CF-\d{4}-\d{4}$/ - Billing_Code: must be one of SEQ-001 through SEQ-006, PROT-001, PROT-002, HIST-001 - Date_Submitted: must be in YYYY-MM-DD format (flag other formats as warnings) - Quantity: must be a positive integer (flag zero or negative as errors) - Duplicate detection: flag rows with the same Sample_ID - Show which specific validation rule failed for each flagged cell
4. ERROR REPORT - Summary bar at top: total rows, errors, warnings, clean rows (with color-coded badges) - Full table showing all rows with color-coded left borders (green=clean, red=error, yellow=warning) - Highlight individual cells that failed validation in red or yellow - Below the table, a detailed error list: "Row 4: Sample_ID 'CF-2026-0001' is a duplicate of Row 1" - Clicking an error in the list scrolls to and briefly highlights that row in the table
5. EXPORT - "Export Report" button that opens a new window with a print-friendly version of the validation report (white background, no drop zone, includes filename and timestamp)
6. DESIGN - Dark theme: background #0f172a, cards #1e293b, text #e2e8f0, accent #10b981 - Clean sans-serif font (Inter from Google Fonts CDN) - Responsive layout, single column - Drag zone should be prominent with a file icon and "Drop CSV here" text - Green/red/yellow color coding consistent throughout
7. TECHNICAL - Pure HTML/CSS/JS in one file, no build step, no dependencies beyond Google Fonts - No Chart.js needed for this tool (it's a validation tool, not a charting tool) - CSV parser must handle quoted fields correctlyThat entire block is the prompt. Paste it as-is. The embedded sample data has deliberate errors in rows 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, and 15 — so you can immediately verify the validator is catching them all.
What you get
After the LLM finishes (typically 60-90 seconds), you will have a single file: intake-validator.html. Open it in any browser.
Expected output structure
intake-validator.html (~500-700 lines)Click Load Example and you should see:
- A summary bar showing 15 total rows, approximately 8-10 errors, 2-3 warnings, and the remaining clean rows.
- Row 2 flagged red: missing PI_Name (required field).
- Row 3 flagged yellow: date in
YYYY/MM/DDformat instead ofYYYY-MM-DD. - Row 4 flagged red: duplicate Sample_ID (same as Row 1).
- Row 5 flagged red:
INVALID-99is not a recognized billing code. - Row 6 flagged red: missing Grant_Number; flagged yellow: date in non-standard format.
- Row 7 flagged red: quantity is 0 (must be positive).
- Row 8 flagged red: Sample_ID
WRONG_FORMATdoes not match theCF-YYYY-NNNNpattern; Emailbrown.marcusis missing @ sign and domain. - Row 10 flagged red: missing Billing_Code.
- Row 12 flagged yellow: date format
20260321is non-standard. - Row 14 flagged red: negative quantity.
- Row 15 flagged red: missing Email.
Most researchers submit Excel files (.xlsx), not CSVs. Ask them to “Save As → CSV (Comma delimited)” before uploading, or add Excel support as a customization (see the SheetJS extension prompt below in the Customize section).
If something is off
LLMs occasionally produce code with small bugs. Here are the most common issues and one-line fix prompts:
| Problem | Follow-up prompt |
|---|---|
| Drag-and-drop doesn’t work | The drag-and-drop zone isn't responding to file drops. Make sure you're calling e.preventDefault() on both dragover and drop events, and reading the file from e.dataTransfer.files[0]. |
| CSV with quoted fields breaks | My CSV has fields with commas inside quotes, like "Chen, Sarah" and the parser is splitting on those commas. Fix the CSV parser to handle quoted fields correctly. |
| All rows show as errors | Every row is flagged as an error even though some are valid. Check that the validation is comparing against the actual cell values and not the header row. |
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works (the 2-minute explanation)
You do not need to read every line of the generated code, but here is the mental model:
- CSV parsing splits each line by commas, but respects quoted fields (a field like
"Chen, Sarah"stays as one value). The first row becomes the header, and every subsequent row becomes a data object with named properties. - Validation rules are a list of functions, each checking one condition. Required-field checks look for empty strings. Regex checks test the pattern. Billing code checks compare against a whitelist array. Duplicate checks use a
Setto track seen IDs. - Color coding assigns a severity to each row based on the worst issue found: red for errors (blocks processing), yellow for warnings (can proceed but should review), green for clean.
- Export clones the report HTML into a new window with print-friendly styles. The data never goes to a server — it stays in your browser.
Client-side validation is the first line of defense, not the only one. Your LIMS or billing system should still enforce rules on its end. But catching errors before data enters the system saves everyone time. A researcher who gets instant feedback at submission fixes their sheet in two minutes. A researcher who gets an email three days later has to context-switch back to a project they have already mentally moved on from. Front-loading validation is a service improvement that costs nothing to deploy.
Customize it
The base validator is useful as-is, but every facility has unique requirements. Each of these is a single follow-up prompt:
Add facility-specific billing codes
Update the billing code validation to use this complete list from our rate schedule:SEQ-001 (Standard Illumina), SEQ-002 (Low-input), SEQ-003 (Single-cell),SEQ-004 (Small RNA), SEQ-005 (ChIP-seq), SEQ-006 (ATAC-seq),PROT-001 (LC-MS/MS), PROT-002 (TMT labeling), HIST-001 (H&E staining),HIST-002 (IHC), FLOW-001 (Cell sorting), FLOW-002 (Analysis only).Show the full service name next to each billing code in the validation output.Add species whitelist
Add a species validation rule. Valid species are: Homo sapiens, Mus musculus,Rattus norvegicus, Drosophila melanogaster, Caenorhabditis elegans,Saccharomyces cerevisiae, Danio rerio, Xenopus laevis, Arabidopsis thaliana.Flag any other species as a warning (not an error) with the message"Uncommon species — verify with facility staff."Add Excel (.xlsx) support
Add support for uploading Excel files (.xlsx) in addition to CSV. Use SheetJS fromCDN (https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js) to parsethe workbook. Read the first sheet, convert it to an array of arrays, and feed itinto the same validation pipeline. If the file has multiple sheets, show a dropdownto select which sheet to validate. Keep CSV support working as before.Add batch summary email draft
Add a "Generate Email" button that creates a pre-formatted email summary of thevalidation results. Include: filename, date, total samples, error count, and abulleted list of all errors. Format it so I can copy-paste it into Outlook as areply to the researcher who submitted the sheet. Keep the tone professional andhelpful.Start with the working validator, then add your facility’s specific rules one prompt at a time. Each prompt builds on what exists. You never need to plan the entire tool upfront — iterate from a solid foundation.
Try it yourself
- Open your CLI tool in an empty folder.
- Paste the main prompt from above.
- Open the generated
intake-validator.htmlin your browser. - Click Load Example to see the validation in action on the embedded test data.
- Export a real sample sheet from your facility’s LIMS or shared drive (as CSV) and drop it on the validator.
- Pick one customization from the list above and add it.
If you manage a core facility, put this HTML file on the shared drive next to the submission template. Link to it in your intake instructions. You just eliminated the most tedious part of the intake process.
Key takeaways
- One prompt, one tool: a detailed prompt with embedded sample data produces a working intake validator in under 2 minutes.
- Client-side validation catches errors before they enter your pipeline — researchers get instant feedback instead of a correction email days later.
- Embedding test data with deliberate errors in the prompt guarantees you can verify the tool works immediately, without needing a separate test file.
- Configurable rules (billing codes, ID formats, required fields) mean one validator pattern serves every core facility — just update the whitelists.
- Drag-and-drop file handling has browser quirks (you must preventDefault on both dragover and drop) — specifying this in the prompt prevents the most common bug.
A researcher submits a sample sheet with 50 rows and your validator flags 3 errors. Why should you validate before processing rather than fixing errors as you encounter them during the run?
Row 4 in the test data has the same Sample_ID as Row 1. What downstream problem does a duplicate Sample_ID cause?
What’s next
In the next lesson, you will build a Usage & Billing Summary Dashboard that takes instrument usage logs and automatically groups them by PI and grant number, producing bar charts and billing totals — the other half of core facility administration.