Provenance & Data Trust

Every row has a birth certificate

On the DS platform, every row in every table carries a data_source column. This column is not nullable. It tells you exactly how that row came into existence:

Value	Meaning	Example
`'manual'`	A human entered it through the application UI	An operator adds a new vehicle assignment
`'seed'`	Loaded during initial setup or migration	Reference data like department codes, role definitions
`'sync'`	Pulled from an external system via integration	Vehicle records synced from the state fleet management API
`'sample'`	Demonstration data for testing or training	Example records used in a dev environment

This is not metadata for developers. This is audit infrastructure. When a state auditor asks “how did this record get into the system?”, the data_source column answers the question at the row level without anyone checking git history or interviewing the team.

💬Why this matters for government

Commercial SaaS can backfill demo data and nobody blinks. Government systems cannot. If an auditor finds a vehicle assignment record with no clear origin, that is a finding. If the record says data_source: 'sample' and it is sitting in a production table, that is a different kind of finding — but at least it is traceable. The column turns “where did this come from?” from an investigation into a column filter.

The four provenance rules

Rule 1: Every table gets the column

No exceptions. Even lookup tables (departments, statuses, categories). Even junction tables (user_role_assignments). The migration always includes:

data_source NVARCHAR(20) NOT NULL DEFAULT 'manual'

If the AI generates a migration without this column, add it. The platform build process does not enforce this automatically (it is a convention, not a constraint check), but every code review will flag it.

Rule 2: The API route sets it

When a user creates a record through the UI, the API route handler sets data_source = 'manual'. When a sync job pulls data from an external system, it sets data_source = 'sync'. The application code controls the value — users never set it directly.

// In the POST handler
const result = await pool.request()
  .input('vehicleId', sql.NVarChar, parsed.data.vehicleId)
  .input('dataSource', sql.NVarChar, 'manual')
  .input('createdBy', sql.NVarChar, session.user.email)
  .query(`
    INSERT INTO vehicle_fleet (vehicle_id, data_source, created_by)
    VALUES (@vehicleId, @dataSource, @createdBy)
  `);

Rule 3: Never fabricate operational data

This is the hardest rule for developers coming from product companies. When you build a new module, the instinct is to seed the database with realistic-looking records so the UI looks populated during demos. On the DS platform, this is not allowed for operational tables.

A new vehicle_fleet table with zero rows is not broken. It is empty. That is the correct state.
A new vehicle_fleet table with 50 fabricated records that look real is a liability. Someone screenshots it. Someone exports it. Someone references “the 50 vehicles in the system” in a meeting. Fabricated data in a government system creates confusion that is expensive to unwind.

⚠The exception: reference data and dev environments

Seed data is fine for reference tables (department codes, status values, role definitions) because those are system configuration, not operational records. And data_source: 'sample' records are fine in dev and staging environments for testing. The rule is: no fabricated operational data in production tables. If you need to demo the module, use the dev environment.

Rule 4: The provenance map tracks ownership

Every module registers its tables in the provenance map (src/config/provenance-map.ts). This creates a graph of which module owns which tables:

export const provenanceMap = {
  'vehicle-fleet': {
    tables: ['vehicle_fleet'],
    dataSourceDefault: 'manual',
  },
  'case-tracker': {
    tables: ['cases', 'case_notes', 'case_attachments'],
    dataSourceDefault: 'manual',
  },
  'hr-sync': {
    tables: ['employees'],
    dataSourceDefault: 'sync',
  },
};

If a table is not claimed by any module, the admin dashboard flags it as orphaned. Orphaned tables are a data governance risk — they contain data that no module maintains, no code updates, and no one is accountable for.

The prompt

This prompt adds provenance to a new table and registers it in the provenance map:

Add data provenance support to the vehicle-fleet module:

1. MIGRATION UPDATE (migrations/20260320_create_vehicle_fleet.sql):
   - Confirm the data_source column exists:
     data_source NVARCHAR(20) NOT NULL DEFAULT 'manual'
   - Add a CHECK constraint: data_source IN ('manual', 'seed', 'sync', 'sample')
   - If the column already exists, do not duplicate it

2. API ROUTE UPDATE (src/app/api/vehicle-fleet/route.ts):
   - POST handler: always set data_source = 'manual' (hardcoded, not from request body)
   - The Zod schema for create should NOT include data_source as an accepted field
   - The Zod schema for update should NOT allow changing data_source
   - data_source is system-controlled, never user-controlled

3. PROVENANCE MAP (src/config/provenance-map.ts):
   - Add entry: 'vehicle-fleet': { tables: ['vehicle_fleet'], dataSourceDefault: 'manual' }
   - Do not modify existing entries

4. TYPES UPDATE (src/types/vehicle-fleet.ts):
   - VehicleFleetRow should include data_source as a read-only field
   - VehicleFleetCreateSchema should NOT include data_source
   - VehicleFleetUpdateSchema should NOT include data_source

IMPORTANT: data_source is never set by the user and never accepted from the
request body. It is set by the API route handler based on the context of the
operation.

Watch it work

Claude Code — Adding provenance to vehicle-fleet

/home/user $ claude

/home/user $

Empty state is correct

When your module first deploys, the vehicle_fleet table has zero rows. The MUI DataGrid shows “No rows.” This is not a bug.

The temptation is strong: seed 10 sample vehicles so the demo looks polished. But here is what happens:

You seed 10 vehicles with data_source: 'sample'
The dev environment looks great in the demo
Someone forgets to exclude sample data from the production migration
Production has 10 phantom vehicles that do not exist in the real fleet
An operator exports the data for a report and includes the phantoms
The report goes to the director’s office with 10 extra vehicles
Someone spends a day figuring out where they came from

The platform’s approach: design your UI to handle empty state gracefully. Show a clear “No vehicles registered yet” message with an “Add Vehicle” button. An empty table with a clear call-to-action is better than a populated table with fake data.

💡Design for zero, one, and many

Test your module with 0 rows, 1 row, and 500 rows. The empty state should not look broken — it should guide the user to add data. The single-row state should not look lonely — the DataGrid should render cleanly. The 500-row state should paginate smoothly. If any of these look wrong, fix the UI before seeding sample data.

When sample data is appropriate

Sample data is legitimate in exactly two contexts:

Dev and staging environments — seed records with data_source: 'sample' so developers can test pagination, filtering, and edge cases. The provenance column makes it obvious these are test records.
Reference data — department codes, status values, role definitions. These are configuration, not operational data. They get data_source: 'seed' and are loaded via migration.

The rule of thumb: if deleting the record would cause a real-world consequence (a vehicle is untracked, a case is lost, a person is unassigned), it is operational data and must not be fabricated. If deleting it only affects the application’s configuration (a dropdown option disappears), it is reference data and can be seeded.

KNOWLEDGE CHECK

You are building a new module for tracking facility maintenance requests. During a demo to the director, the table shows 'No requests found.' Your project manager says 'Can you seed some realistic requests so the demo looks better?' What is the correct response?

What’s next

Your module has clean data provenance. Every record is traceable. Now it is time to present that data to three different audiences. The next lesson covers the DensityGate component — executive summaries, operational views, and technical deep-dives — all from the same data, all in the same page.

What you'll learn