Provenance & Data Trust
What you'll learn
~20 min- Add the data_source column to every table and enforce provenance values
- Explain why empty state is correct and fabricated data is never acceptable
- Register a module's tables in the provenance map
Every row has a birth certificate
On the DS platform, every row in every table carries a data_source column. This column is not nullable. It tells you exactly how that row came into existence:
| Value | Meaning | Example |
|---|---|---|
'manual' | A human entered it through the application UI | An operator adds a new vehicle assignment |
'seed' | Loaded during initial setup or migration | Reference data like department codes, role definitions |
'sync' | Pulled from an external system via integration | Vehicle records synced from the state fleet management API |
'sample' | Demonstration data for testing or training | Example records used in a dev environment |
This is not metadata for developers. This is audit infrastructure. When a state auditor asks “how did this record get into the system?”, the data_source column answers the question at the row level without anyone checking git history or interviewing the team.
Commercial SaaS can backfill demo data and nobody blinks. Government systems cannot. If an auditor finds a vehicle assignment record with no clear origin, that is a finding. If the record says data_source: 'sample' and it is sitting in a production table, that is a different kind of finding — but at least it is traceable. The column turns “where did this come from?” from an investigation into a column filter.
The four provenance rules
Rule 1: Every table gets the column
No exceptions. Even lookup tables (departments, statuses, categories). Even junction tables (user_role_assignments). The migration always includes:
data_source NVARCHAR(20) NOT NULL DEFAULT 'manual'If the AI generates a migration without this column, add it. The platform build process does not enforce this automatically (it is a convention, not a constraint check), but every code review will flag it.
Rule 2: The API route sets it
When a user creates a record through the UI, the API route handler sets data_source = 'manual'. When a sync job pulls data from an external system, it sets data_source = 'sync'. The application code controls the value — users never set it directly.
// In the POST handlerconst result = await pool.request() .input('vehicleId', sql.NVarChar, parsed.data.vehicleId) .input('dataSource', sql.NVarChar, 'manual') .input('createdBy', sql.NVarChar, session.user.email) .query(` INSERT INTO vehicle_fleet (vehicle_id, data_source, created_by) VALUES (@vehicleId, @dataSource, @createdBy) `);Rule 3: Never fabricate operational data
This is the hardest rule for developers coming from product companies. When you build a new module, the instinct is to seed the database with realistic-looking records so the UI looks populated during demos. On the DS platform, this is not allowed for operational tables.
- A new
vehicle_fleettable with zero rows is not broken. It is empty. That is the correct state. - A new
vehicle_fleettable with 50 fabricated records that look real is a liability. Someone screenshots it. Someone exports it. Someone references “the 50 vehicles in the system” in a meeting. Fabricated data in a government system creates confusion that is expensive to unwind.
Seed data is fine for reference tables (department codes, status values, role definitions) because those are system configuration, not operational records. And data_source: 'sample' records are fine in dev and staging environments for testing. The rule is: no fabricated operational data in production tables. If you need to demo the module, use the dev environment.
Rule 4: The provenance map tracks ownership
Every module registers its tables in the provenance map (src/config/provenance-map.ts). This creates a graph of which module owns which tables:
export const provenanceMap = { 'vehicle-fleet': { tables: ['vehicle_fleet'], dataSourceDefault: 'manual', }, 'case-tracker': { tables: ['cases', 'case_notes', 'case_attachments'], dataSourceDefault: 'manual', }, 'hr-sync': { tables: ['employees'], dataSourceDefault: 'sync', },};If a table is not claimed by any module, the admin dashboard flags it as orphaned. Orphaned tables are a data governance risk — they contain data that no module maintains, no code updates, and no one is accountable for.
The prompt
This prompt adds provenance to a new table and registers it in the provenance map:
Add data provenance support to the vehicle-fleet module:
1. MIGRATION UPDATE (migrations/20260320_create_vehicle_fleet.sql): - Confirm the data_source column exists: data_source NVARCHAR(20) NOT NULL DEFAULT 'manual' - Add a CHECK constraint: data_source IN ('manual', 'seed', 'sync', 'sample') - If the column already exists, do not duplicate it
2. API ROUTE UPDATE (src/app/api/vehicle-fleet/route.ts): - POST handler: always set data_source = 'manual' (hardcoded, not from request body) - The Zod schema for create should NOT include data_source as an accepted field - The Zod schema for update should NOT allow changing data_source - data_source is system-controlled, never user-controlled
3. PROVENANCE MAP (src/config/provenance-map.ts): - Add entry: 'vehicle-fleet': { tables: ['vehicle_fleet'], dataSourceDefault: 'manual' } - Do not modify existing entries
4. TYPES UPDATE (src/types/vehicle-fleet.ts): - VehicleFleetRow should include data_source as a read-only field - VehicleFleetCreateSchema should NOT include data_source - VehicleFleetUpdateSchema should NOT include data_source
IMPORTANT: data_source is never set by the user and never accepted from therequest body. It is set by the API route handler based on the context of theoperation.Watch it work
Empty state is correct
When your module first deploys, the vehicle_fleet table has zero rows. The MUI DataGrid shows “No rows.” This is not a bug.
The temptation is strong: seed 10 sample vehicles so the demo looks polished. But here is what happens:
- You seed 10 vehicles with
data_source: 'sample' - The dev environment looks great in the demo
- Someone forgets to exclude sample data from the production migration
- Production has 10 phantom vehicles that do not exist in the real fleet
- An operator exports the data for a report and includes the phantoms
- The report goes to the director’s office with 10 extra vehicles
- Someone spends a day figuring out where they came from
The platform’s approach: design your UI to handle empty state gracefully. Show a clear “No vehicles registered yet” message with an “Add Vehicle” button. An empty table with a clear call-to-action is better than a populated table with fake data.
Test your module with 0 rows, 1 row, and 500 rows. The empty state should not look broken — it should guide the user to add data. The single-row state should not look lonely — the DataGrid should render cleanly. The 500-row state should paginate smoothly. If any of these look wrong, fix the UI before seeding sample data.
When sample data is appropriate
Sample data is legitimate in exactly two contexts:
- Dev and staging environments — seed records with
data_source: 'sample'so developers can test pagination, filtering, and edge cases. The provenance column makes it obvious these are test records. - Reference data — department codes, status values, role definitions. These are configuration, not operational data. They get
data_source: 'seed'and are loaded via migration.
The rule of thumb: if deleting the record would cause a real-world consequence (a vehicle is untracked, a case is lost, a person is unassigned), it is operational data and must not be fabricated. If deleting it only affects the application’s configuration (a dropdown option disappears), it is reference data and can be seeded.
You are building a new module for tracking facility maintenance requests. During a demo to the director, the table shows 'No requests found.' Your project manager says 'Can you seed some realistic requests so the demo looks better?' What is the correct response?
What’s next
Your module has clean data provenance. Every record is traceable. Now it is time to present that data to three different audiences. The next lesson covers the DensityGate component — executive summaries, operational views, and technical deep-dives — all from the same data, all in the same page.