Business Report Generator

What you’re building

MIS students write reports constantly. Financial summaries, operations dashboards, quarterly reviews, market analysis write-ups — they all follow the same structure: title page, executive summary, data tables, charts, key findings, recommendations. Every single time, you spend hours formatting in Word, tweaking Excel charts, and copy-pasting numbers.

In this lesson you will build a Python CLI tool that takes a CSV file as input and generates a polished, presentation-ready HTML executive report with auto-generated charts, summary statistics, styled data tables, and an executive summary section. Run one command, get a complete report. Open it in a browser, print it to PDF, and submit it.

This is the capstone build of the MIS track. You are moving from browser tools (Lesson 1) and React apps (Lessons 2-3) to a command-line tool that processes files on disk. This is how automation works in the real world — scripts that take input, produce output, and integrate into workflows.

ℹPrerequisites: Python jump

This lesson uses Python with pandas, plotly, and Jinja2 — a different stack from the React lessons in Lessons 2-3. If you have not used Python before, the AI handles the setup. Verify you have Python 3 installed: run python3 --version (or python --version on some systems). If not, install it via your package manager or from python.org.

The showcase

When finished, your CLI tool will:

Accept a CSV file and an optional report title from the command line.
Auto-analyze the data: detect column types, compute descriptive statistics, identify trends, find outliers.
Generate a styled HTML report containing:
- Title page with report name, date, and data summary.
- Executive summary: 3-5 bullet points highlighting key findings (highest values, biggest changes, notable outliers).
- KPI section: large-format numbers for the most important metrics.
- Charts section: auto-generated bar, line, and pie charts using Plotly (embedded in the HTML).
- Data tables: paginated, sortable tables with conditional formatting.
- Statistical summary: descriptive statistics for all numeric columns.
- Methodology note: what data was analyzed, how many rows/columns, any data quality issues found.
Print-ready: the HTML includes print CSS media queries so it looks good when printed to PDF from the browser.
Configurable: a YAML config file controls which charts to generate, report branding, and statistical thresholds.

Think of it as an automated analyst that turns raw data into a deliverable.

ℹMIS Connection: Business Intelligence and Reporting

Report generation is the final mile of any analytics workflow. In business intelligence and capstone courses, you are graded on the quality of your deliverables as much as the analysis itself. This tool automates the formatting so you can focus on the analysis and interpretation. In industry, automated reporting pipelines (built with tools like Python, R Markdown, or Tableau Server) save analysts dozens of hours per month. Building one yourself demonstrates both technical skill and business process understanding.

The prompt

Open your AI CLI tool (such as Claude Code, Gemini CLI, or your preferred tool) in an empty directory and paste this prompt:

Create a Python CLI tool called report-generator that takes a CSV file and produces
a styled HTML executive report. Use Python 3.10+ with these packages: pandas,
plotly, pyyaml, jinja2. Structure the project properly.

PROJECT STRUCTURE:
report-generator/
├── report_generator/
│   ├── __init__.py
│   ├── __main__.py          # enables python -m report_generator
│   ├── cli.py              # argparse CLI entry point
│   ├── analyzer.py         # data analysis and statistics
│   ├── chart_builder.py    # Plotly chart generation
│   ├── report_builder.py   # Jinja2 HTML report assembly
│   ├── config.py           # YAML config loader
│   └── templates/
│       └── report.html     # Jinja2 HTML template
├── config.yaml             # default configuration
├── sample_data/
│   └── quarterly_sales.csv # sample dataset (50+ rows)
├── requirements.txt
├── setup.py
└── README.md

CLI INTERFACE:
  python -m report_generator --input data.csv --output report.html
  python -m report_generator --input data.csv --output report.html --title "Q4 Sales Report"
  python -m report_generator --input data.csv --output report.html --config config.yaml

OPTIONS:
  --input, -i      Path to CSV file (required)
  --output, -o     Output HTML file path (required)
  --title, -t      Report title (default: auto-generated from filename)
  --config, -c     Path to YAML config file (optional)
  --theme          Color theme: "corporate" | "modern" | "minimal" (default: corporate)
  --no-charts      Skip chart generation (text-only report)
  --verbose        Enable debug logging

SAMPLE DATA (quarterly_sales.csv):
  Generate a realistic 60-row business dataset with columns:
  - Date (monthly, Jan 2024 - Dec 2024, but multiple entries per month)
  - Region (North, South, East, West)
  - Product_Category (Electronics, Furniture, Office Supplies, Software)
  - Sales_Rep (8 realistic names)
  - Units_Sold (integer, 10-500)
  - Revenue (decimal, 1000-50000)
  - Cost (decimal, 60-85% of Revenue)
  - Customer_Satisfaction (decimal, 3.0-5.0)
  The data should have realistic patterns: seasonal trends (Q4 higher), regional
  differences, some product categories outperforming others.

ANALYZER (analyzer.py):
  Analyze the CSV and produce a structured analysis dict containing:

  1. overview:
     - row_count, column_count
     - date_range (if date column detected)
     - numeric_columns, categorical_columns, date_columns

  2. descriptive_stats (per numeric column):
     - count, mean, median, std, min, max, Q1, Q3
     - skewness and kurtosis
     - coefficient of variation

  3. key_findings (auto-generated list of 5-7 bullet points):
     - Highest and lowest values with context (e.g., "West region had the
       highest average Revenue at $28,450")
     - Trend direction for time-series data (e.g., "Revenue showed a 15%
       increase from Q1 to Q4")
     - Outliers: values more than 2 standard deviations from the mean
     - Correlations: pairs of numeric columns with |r| > 0.7
     - Category comparisons: which category leads in each numeric metric

  4. data_quality:
     - Missing values per column
     - Duplicate rows
     - Potential data type issues

CHART BUILDER (chart_builder.py):
  Generate these Plotly charts as HTML div strings (embedded, no external files):

  1. Revenue by Category: horizontal bar chart, sorted descending
  2. Revenue over Time: line chart with monthly aggregation (if date column exists)
  3. Revenue by Region: pie chart with percentage labels
  4. Top Performers: bar chart of top 10 values by a key metric
  5. Distribution: histogram for each numeric column
  6. Correlation heatmap: for all numeric columns
  7. Scatter plot: for the two most correlated numeric columns

  Each chart should:
  - Use the selected color theme
  - Have clear titles and axis labels
  - Include hover tooltips with formatted numbers
  - Be responsive (fill container width)
  - Use Plotly's built-in export button (camera icon for PNG download)

REPORT TEMPLATE (report.html - Jinja2):
  A professional HTML report with these sections:

  1. TITLE PAGE
     - Report title (large, centered)
     - Subtitle: "Generated on [date] from [filename]"
     - Data summary: rows, columns, date range
     - Table of contents (linked to sections)

  2. EXECUTIVE SUMMARY
     - "Key Findings" section with the auto-generated bullet points
     - 3-4 large KPI cards showing the most important metrics
       (total revenue, average satisfaction, top region, etc.)

  3. CHARTS
     - Each chart in its own section with a brief auto-generated caption
     - Charts are full-width in a single column for readability

  4. DATA TABLES
     - Summary statistics table (the descriptive_stats output)
     - Top/bottom 10 rows by the primary metric
     - Category breakdown table (pivot-style: categories as rows,
       metrics as columns with totals)
     - All tables have alternating row colors, aligned numbers, and
       formatted values (commas in thousands, 2 decimal places for currency)

  5. METHODOLOGY
     - Data source filename and path
     - Processing date and time
     - Row count, column count
     - Data quality notes (missing values, duplicates found)
     - Tools used: Python, Pandas, Plotly

  6. APPENDIX
     - Full descriptive statistics for every column
     - Correlation matrix as a formatted table

  STYLING:
  - Corporate theme: navy (#1e3a5f) headers, white background, gray accents,
    professional serif font for headings (Georgia), sans-serif for body (Segoe UI)
  - Modern theme: dark (#111827) background, slate cards, accent blue (#3b82f6),
    Inter font throughout
  - Minimal theme: white background, black text, thin borders, no color fills,
    system font stack
  - Print CSS: hide interactive elements, force page breaks between sections,
    charts rendered at fixed sizes

CONFIG (config.yaml):
  report:
    title: "Business Report"
    theme: "corporate"
    logo_url: ""  # optional company logo URL
  analysis:
    date_column: "auto"  # auto-detect or specify column name
    primary_metric: "auto"  # auto-detect or specify column name
    outlier_threshold: 2.0  # standard deviations
    correlation_threshold: 0.7
  charts:
    enabled: true
    types: ["bar", "line", "pie", "histogram", "heatmap", "scatter"]
    color_palette: ["#3b82f6", "#10b981", "#f59e0b", "#ef4444", "#8b5cf6",
                     "#ec4899", "#06b6d4", "#84cc16"]
  tables:
    max_rows: 10
    format_currency: true
    currency_symbol: "$"

Generate all files with complete, working implementations. Include the sample CSV
with realistic data. The tool should produce a polished report on the first run.

💡The sample data matters

The prompt includes specific instructions for the sample CSV because realistic data produces realistic reports. If the sample data is random noise, the auto-generated findings will be meaningless. The seasonal patterns and regional differences ensure the analyzer has real trends to detect and report on.

What you get

After the LLM generates the project, set it up:

cd report-generator
python -m venv .venv
source .venv/bin/activate    # On Windows: .venv\Scripts\activate
pip install -e .

Then generate your first report:

python -m report_generator --input sample_data/quarterly_sales.csv --output report.html --title "Q4 2024 Sales Performance Report"

Expected output

report.html    (~800-1200 lines of styled HTML with embedded Plotly charts)

Open report.html in your browser. You should see:

Title page with “Q4 2024 Sales Performance Report”, the generation date, and a data summary showing 60 rows and 8 columns.
Executive summary with findings like “West region generated the highest total revenue at $X” and “Revenue increased 15% from Q1 to Q4, driven primarily by Electronics.”
KPI cards showing total revenue, average customer satisfaction, total units sold, and the top-performing region.
Charts: a horizontal bar chart of revenue by category, a line chart showing monthly revenue trends, a pie chart of regional contribution, and a correlation heatmap.
Data tables with summary statistics, top 10 transactions, and a category breakdown pivot table.
Print-ready: press Ctrl+P (or Cmd+P on Mac) and the print preview should show clean page breaks, no interactive elements, and properly sized charts.

⚠Verify auto-generated findings

The executive summary and key findings are generated by code, not by human analysis. Always spot-check the numbers against the raw data — automated aggregations can misidentify trends, miscalculate percentages, or flag misleading patterns (such as a 250% increase driven by a partial first month). Treat AI-generated insights as a starting point for your analysis, not a finished product.

⚠Plotly file sizes

Plotly embeds the full Plotly.js library in the HTML file (about 3 MB). This makes the report self-contained and interactive, but the file will be large. If you need a smaller file, use the --no-charts flag for a text-only report, or add a follow-up prompt to use Chart.js instead (smaller library, less interactivity).

When things go wrong

Python CLI tools introduce a new category of issues: dependency management, file I/O, and data parsing errors. Here is how to diagnose the most common problems.

🔧

When Things Go Wrong

Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.

Symptom

ModuleNotFoundError when running the report generator

Evidence

python -m report_generator --input data.csv --output report.html gives: ModuleNotFoundError: No module named 'plotly'

What to ask the AI

"I get ModuleNotFoundError for plotly when running the report generator. Make sure requirements.txt includes pandas, plotly, pyyaml, and jinja2 with version pins. Also verify that setup.py has install_requires listing the same packages. I should be able to run pip install -e . and then python -m report_generator without import errors."

Symptom

Charts render as blank white rectangles in the HTML report

Evidence

The report HTML opens fine, the title page and data tables look correct, but all chart sections show empty white boxes where Plotly charts should be.

What to ask the AI

"Plotly charts appear as blank rectangles in the generated report. The issue is likely that Plotly.js is not being included in the HTML. Make sure the Jinja2 template includes the Plotly.js CDN script tag in the head, and that chart divs are generated with plotly.io.to_html(fig, include_plotlyjs=False, full_html=False). The CDN should be loaded once in the template head and each chart div should just contain the data and layout."

Symptom

Report crashes on CSV files with missing values or mixed data types

Evidence

Running the tool on a real-world CSV (not the sample data) gives: ValueError: could not convert string to float: 'N/A'. The CSV has blank cells and 'N/A' entries in numeric columns.

What to ask the AI

"The report generator crashes on CSVs with missing values and 'N/A' strings. In the analyzer, add data cleaning before analysis: use pd.read_csv with na_values=['N/A', 'n/a', 'NA', '', 'null', 'NULL', '-']. Then handle NaN values in statistics calculations using pandas dropna() or numpy nanmean/nanstd. The key_findings generator should note how many missing values were found."

Symptom

Print to PDF cuts charts in half across page breaks

Evidence

When I press Ctrl+P in Chrome, the print preview shows Plotly charts split between two pages. The top half of a chart is at the bottom of page 2 and the bottom half is at the top of page 3.

What to ask the AI

"Charts are getting split across page breaks when printing to PDF. Add CSS print media queries that prevent page breaks inside chart containers: use 'break-inside: avoid' and 'page-break-inside: avoid' on each chart wrapper div. Also set chart containers to a fixed height in print mode (e.g., 400px) so they fit within a single page. Add 'break-before: page' on major section headings to force clean page breaks."

Symptom

Data aggregation produces wrong numbers in the category breakdown table

Evidence

The category breakdown table shows Electronics with total Revenue of $12,000 but manually summing the Revenue column in the CSV for Electronics gives $45,600. The pie chart percentages also look wrong.

What to ask the AI

"The category breakdown table shows incorrect aggregation values. Check that the groupby aggregation in the analyzer uses the correct column name (case-sensitive) and the correct aggregation function (sum, not mean or count). Also verify that the Revenue column is being parsed as numeric, not string. Print the grouped DataFrame to verify: df.groupby('Product_Category')['Revenue'].sum(). Make sure the chart builder uses the same aggregation logic."

How it works

The report generator follows a pipeline pattern common in data engineering:

CLI (cli.py) parses arguments with argparse and orchestrates the pipeline. It loads the config, reads the CSV into a Pandas DataFrame, passes it through the analyzer, chart builder, and report builder in sequence.
Analyzer (analyzer.py) uses Pandas to compute descriptive statistics (df.describe()), detect column types (df.dtypes), find correlations (df.corr()), and identify outliers (values beyond mean +/- threshold * std). The key findings are generated by comparing aggregated values: group by each categorical column, compute means, and report the highest/lowest.
Chart Builder (chart_builder.py) uses Plotly Express and Plotly Graph Objects to create each chart. Each chart is rendered as an HTML div string using plotly.io.to_html(fig, include_plotlyjs=False, full_html=False). Only one copy of Plotly.js is included in the final report (in the template head).
Report Builder (report_builder.py) loads the Jinja2 template, fills in all the analysis results and chart divs, and writes the final HTML file. Jinja2 handles the looping (iterating over charts, table rows, findings) and conditional logic (showing data quality warnings only if issues exist).
Config (config.py) loads the YAML file and merges it with defaults. This means the tool works with zero configuration but can be customized by editing the YAML.

🔍For Developers: The Jinja2 template pattern

Jinja2 is the same template engine used by Flask, Django, and Ansible. Learning it here transfers directly to web development and DevOps. The template file (report.html) contains HTML with special tags like {{ title }} for variable insertion and {% for finding in findings %} for loops. The Python code passes a dictionary of values to the template, and Jinja2 renders the final HTML. This separation of logic (Python) and presentation (HTML template) is a fundamental design pattern in software engineering.

🔍From Manual Reports to Automated Pipelines: The Business Case

The report generator you just built automates a single step: turning a CSV into a formatted report. In enterprise settings, this is part of a larger reporting pipeline that runs automatically:

Data extraction: A scheduled script pulls data from a database, API, or file share (e.g., a nightly SQL query against the sales database).
Data transformation: The raw data is cleaned, aggregated, and enriched (e.g., joining sales data with customer demographics).
Report generation: Your tool runs, producing the HTML or PDF report.
Distribution: The report is emailed to stakeholders, uploaded to SharePoint, or published to a dashboard portal.
Archival: Previous reports are stored with timestamps for historical comparison.

Tools used in industry for each step:

Extraction: Python scripts, SSIS (SQL Server), Informatica, Fivetran
Transformation: pandas, dbt, SQL stored procedures, Power Query
Report generation: Python + Jinja2 (what you built), R Markdown, Tableau Server, Power BI Service
Distribution: smtplib (Python email), Microsoft Power Automate, Airflow
Archival: Amazon S3, Azure Blob Storage, network file shares

The ROI calculation: If a report takes 2 hours to produce manually and is generated weekly for 10 clients, that is 20 hours/week or ~1,000 hours/year. At a loaded cost of $50/hour for an analyst, that is $50,000/year spent on formatting. Your automation reduces it to the time it takes to run one command per client — about 10 minutes/week total. The savings fund the automation effort many times over.

This is the kind of analysis MIS graduates are expected to make: not just “can I build this?” but “should we build this, and what is the business impact?”

Customize it

Add comparison reports

Add a --compare flag that accepts a second CSV file. When set, generate a comparison
report that shows side-by-side analysis: this period vs last period. For each numeric
metric, show the absolute change and percentage change. Color-code increases in green
and decreases in red. Add delta charts (bar charts showing the difference between
periods). The executive summary should focus on what changed and why it matters.

Add PDF direct output

⚠WeasyPrint on Windows

WeasyPrint requires OS-level C libraries (GTK3, Pango) that do not install via pip alone on Windows. If you are on Windows, either use WSL (recommended) or stick with the browser’s Ctrl+P “Print to PDF” feature. On macOS and Linux, pip install weasyprint usually works directly.

Add a --pdf flag that outputs a PDF file directly instead of HTML. Use the weasyprint
library. The PDF should have proper page numbers, headers with the report title on
each page, a table of contents with page numbers, and charts rendered as static
images. Add weasyprint to requirements.txt. Fall back to HTML output if weasyprint
is not installed.

Add natural language customization

Add a --focus flag that accepts a plain English instruction like --focus "Focus on
the West region and compare Q3 to Q4 for Electronics only". The analyzer should
filter the data according to the instruction and adjust the executive summary and
charts accordingly. Use simple keyword parsing: look for region names, date ranges,
and category names in the instruction and apply pandas filters.

Add scheduled reports

Add a --schedule flag that sets up a cron job (Linux) or scheduled task (Windows)
to regenerate the report daily/weekly/monthly. The tool should:
1. Check if the input CSV has changed since the last run (compare file hash)
2. Only regenerate if data changed
3. Optionally email the report using smtplib (--email flag)
4. Log all scheduled runs to a history file
Print the cron expression to stdout so the user can review it before activating.

ℹMIS Connection: Automating Business Processes

What you just built is a Robotic Process Automation (RPA) workflow in miniature. In business and information systems coursework, you learn about automating repetitive business processes to improve efficiency and reduce errors. This report generator takes a task that manually takes 2-3 hours (data analysis, chart creation, formatting, writing summaries) and reduces it to a 10-second command. Multiply that across an organization producing weekly reports for 50 clients, and you have saved 100-150 hours per week. That is the business case for automation — and you can now build it.

The story so far

Across four lessons, you have built:

Lesson	Tool	Technology	MIS Application
1	Business Analytics Dashboard	Single HTML + Chart.js	Data exploration, BI
2	Database Schema Designer	React + Vite	Database design, data modeling
3	Project Management Tracker	React + Vite	Project management, Agile
4	Business Report Generator	Python CLI	Reporting, automation

Each one demonstrates a different technical skill (front-end, full-stack, back-end automation) applied to a real MIS domain. But the report generator raises a question: where does the CSV come from? In the next two lessons, you will build the data plumbing (ETL pipeline) and the automation layer (scheduled orchestrator) that make these tools production-ready.

Try it yourself

Generate the report tool with the prompt above.
Run it on the included sample data and open the HTML report.
Read the executive summary. Does it accurately describe the data? If a finding is wrong, look at the analyzer code and figure out why.
Try the different themes: --theme corporate, --theme modern, --theme minimal. Print each to PDF and compare.
Now find a real CSV from your coursework — any dataset from an MIS or statistics class. Run the tool on it. How good is the auto-generated analysis?
Edit the config.yaml to change the color palette and currency symbol. Re-run and see the changes.
Pick one customization from the list above and add it with a follow-up prompt.

Key Takeaways

Automation is the highest-value MIS skill. Taking a 2-hour manual process and reducing it to a 10-second command is exactly what MIS professionals are hired to do. This report generator is a concrete example you can point to.
The pipeline pattern (input, process, output) applies everywhere. CLI arguments define the input, the analyzer and chart builder do the processing, and the Jinja2 template produces the output. This is the same pattern used in ETL pipelines, CI/CD systems, and data engineering workflows.
Configuration separates policy from mechanism. The YAML config file lets you change the report’s appearance, metric thresholds, and chart types without touching the code. This is a fundamental principle in systems design.
Print-ready HTML is a powerful output format. Unlike Word documents (hard to generate programmatically) or raw PDFs (complex libraries), HTML with print CSS gives you interactive reports that also look professional when printed. One format, two use cases.
Real-world data is messy. The sample CSV works perfectly, but your coursework CSVs will have missing values, inconsistent formatting, and mixed data types. Learning to handle these edge cases (see the troubleshooting section) is the difference between a demo and a real tool.

Portfolio Suggestion

The report generator is a strong portfolio piece on its own. Include 2-3 sample reports generated from different datasets (sales data, survey results, financial data) and commit the HTML reports so visitors can download and open them. The full portfolio strategy — combining all six MIS tools into a single showcase repository — is covered in Lesson 6 after you complete the entire module.

KNOWLEDGE CHECK

You run the report generator on a CSV from your statistics class. The executive summary says 'Revenue showed a 250% increase from January to December.' You check the CSV and see that January had $200 in revenue (a partial month when the business launched) and December had $700. The finding is technically correct but misleading. What is the best approach?

What’s next

In the next lesson, you will build an Automated ETL Pipeline — a Python CLI tool that extracts data from multiple source formats (CSV, JSON, API), applies SQL transformations (JOINs, aggregations, deduplication), and loads clean data into a SQLite database. It is the data plumbing that feeds the dashboards and reports you have already built.

What you'll learn

What you’re building

The showcase

The prompt

What you get

Expected output

When things go wrong

When Things Go Wrong

How it works

Customize it

Add comparison reports

Add PDF direct output

Add natural language customization

Add scheduled reports

The story so far

Try it yourself

Key Takeaways

Portfolio Suggestion

What’s next