Business Report Generator
What you'll learn
~90-120 min- Build a Python CLI tool that transforms CSV data into polished executive reports
- Understand the data analysis pipeline: ingestion, statistical analysis, visualization, and presentation
- Configure report generation with YAML settings for themes, charts, and metrics
- Apply automation thinking to eliminate repetitive reporting workflows
What you’re building
MIS students write reports constantly. Financial summaries, operations dashboards, quarterly reviews, market analysis write-ups — they all follow the same structure: title page, executive summary, data tables, charts, key findings, recommendations. Every single time, you spend hours formatting in Word, tweaking Excel charts, and copy-pasting numbers.
In this lesson you will build a Python CLI tool that takes a CSV file as input and generates a polished, presentation-ready HTML executive report with auto-generated charts, summary statistics, styled data tables, and an executive summary section. Run one command, get a complete report. Open it in a browser, print it to PDF, and submit it.
This is the capstone build of the MIS track. You are moving from browser tools (Lesson 1) and React apps (Lessons 2-3) to a command-line tool that processes files on disk. This is how automation works in the real world — scripts that take input, produce output, and integrate into workflows.
This lesson uses Python with pandas, plotly, and Jinja2 — a different stack from the React lessons in Lessons 2-3. If you have not used Python before, the AI handles the setup. Verify you have Python 3 installed: run python3 --version (or python --version on some systems). If not, install it via your package manager or from python.org.
The showcase
When finished, your CLI tool will:
- Accept a CSV file and an optional report title from the command line.
- Auto-analyze the data: detect column types, compute descriptive statistics, identify trends, find outliers.
- Generate a styled HTML report containing:
- Title page with report name, date, and data summary.
- Executive summary: 3-5 bullet points highlighting key findings (highest values, biggest changes, notable outliers).
- KPI section: large-format numbers for the most important metrics.
- Charts section: auto-generated bar, line, and pie charts using Plotly (embedded in the HTML).
- Data tables: paginated, sortable tables with conditional formatting.
- Statistical summary: descriptive statistics for all numeric columns.
- Methodology note: what data was analyzed, how many rows/columns, any data quality issues found.
- Print-ready: the HTML includes print CSS media queries so it looks good when printed to PDF from the browser.
- Configurable: a YAML config file controls which charts to generate, report branding, and statistical thresholds.
Think of it as an automated analyst that turns raw data into a deliverable.
Report generation is the final mile of any analytics workflow. In business intelligence and capstone courses, you are graded on the quality of your deliverables as much as the analysis itself. This tool automates the formatting so you can focus on the analysis and interpretation. In industry, automated reporting pipelines (built with tools like Python, R Markdown, or Tableau Server) save analysts dozens of hours per month. Building one yourself demonstrates both technical skill and business process understanding.
The prompt
Open your AI CLI tool (such as Claude Code, Gemini CLI, or your preferred tool) in an empty directory and paste this prompt:
Create a Python CLI tool called report-generator that takes a CSV file and producesa styled HTML executive report. Use Python 3.10+ with these packages: pandas,plotly, pyyaml, jinja2. Structure the project properly.
PROJECT STRUCTURE:report-generator/├── report_generator/│ ├── __init__.py│ ├── __main__.py # enables python -m report_generator│ ├── cli.py # argparse CLI entry point│ ├── analyzer.py # data analysis and statistics│ ├── chart_builder.py # Plotly chart generation│ ├── report_builder.py # Jinja2 HTML report assembly│ ├── config.py # YAML config loader│ └── templates/│ └── report.html # Jinja2 HTML template├── config.yaml # default configuration├── sample_data/│ └── quarterly_sales.csv # sample dataset (50+ rows)├── requirements.txt├── setup.py└── README.md
CLI INTERFACE: python -m report_generator --input data.csv --output report.html python -m report_generator --input data.csv --output report.html --title "Q4 Sales Report" python -m report_generator --input data.csv --output report.html --config config.yaml
OPTIONS: --input, -i Path to CSV file (required) --output, -o Output HTML file path (required) --title, -t Report title (default: auto-generated from filename) --config, -c Path to YAML config file (optional) --theme Color theme: "corporate" | "modern" | "minimal" (default: corporate) --no-charts Skip chart generation (text-only report) --verbose Enable debug logging
SAMPLE DATA (quarterly_sales.csv): Generate a realistic 60-row business dataset with columns: - Date (monthly, Jan 2024 - Dec 2024, but multiple entries per month) - Region (North, South, East, West) - Product_Category (Electronics, Furniture, Office Supplies, Software) - Sales_Rep (8 realistic names) - Units_Sold (integer, 10-500) - Revenue (decimal, 1000-50000) - Cost (decimal, 60-85% of Revenue) - Customer_Satisfaction (decimal, 3.0-5.0) The data should have realistic patterns: seasonal trends (Q4 higher), regional differences, some product categories outperforming others.
ANALYZER (analyzer.py): Analyze the CSV and produce a structured analysis dict containing:
1. overview: - row_count, column_count - date_range (if date column detected) - numeric_columns, categorical_columns, date_columns
2. descriptive_stats (per numeric column): - count, mean, median, std, min, max, Q1, Q3 - skewness and kurtosis - coefficient of variation
3. key_findings (auto-generated list of 5-7 bullet points): - Highest and lowest values with context (e.g., "West region had the highest average Revenue at $28,450") - Trend direction for time-series data (e.g., "Revenue showed a 15% increase from Q1 to Q4") - Outliers: values more than 2 standard deviations from the mean - Correlations: pairs of numeric columns with |r| > 0.7 - Category comparisons: which category leads in each numeric metric
4. data_quality: - Missing values per column - Duplicate rows - Potential data type issues
CHART BUILDER (chart_builder.py): Generate these Plotly charts as HTML div strings (embedded, no external files):
1. Revenue by Category: horizontal bar chart, sorted descending 2. Revenue over Time: line chart with monthly aggregation (if date column exists) 3. Revenue by Region: pie chart with percentage labels 4. Top Performers: bar chart of top 10 values by a key metric 5. Distribution: histogram for each numeric column 6. Correlation heatmap: for all numeric columns 7. Scatter plot: for the two most correlated numeric columns
Each chart should: - Use the selected color theme - Have clear titles and axis labels - Include hover tooltips with formatted numbers - Be responsive (fill container width) - Use Plotly's built-in export button (camera icon for PNG download)
REPORT TEMPLATE (report.html - Jinja2): A professional HTML report with these sections:
1. TITLE PAGE - Report title (large, centered) - Subtitle: "Generated on [date] from [filename]" - Data summary: rows, columns, date range - Table of contents (linked to sections)
2. EXECUTIVE SUMMARY - "Key Findings" section with the auto-generated bullet points - 3-4 large KPI cards showing the most important metrics (total revenue, average satisfaction, top region, etc.)
3. CHARTS - Each chart in its own section with a brief auto-generated caption - Charts are full-width in a single column for readability
4. DATA TABLES - Summary statistics table (the descriptive_stats output) - Top/bottom 10 rows by the primary metric - Category breakdown table (pivot-style: categories as rows, metrics as columns with totals) - All tables have alternating row colors, aligned numbers, and formatted values (commas in thousands, 2 decimal places for currency)
5. METHODOLOGY - Data source filename and path - Processing date and time - Row count, column count - Data quality notes (missing values, duplicates found) - Tools used: Python, Pandas, Plotly
6. APPENDIX - Full descriptive statistics for every column - Correlation matrix as a formatted table
STYLING: - Corporate theme: navy (#1e3a5f) headers, white background, gray accents, professional serif font for headings (Georgia), sans-serif for body (Segoe UI) - Modern theme: dark (#111827) background, slate cards, accent blue (#3b82f6), Inter font throughout - Minimal theme: white background, black text, thin borders, no color fills, system font stack - Print CSS: hide interactive elements, force page breaks between sections, charts rendered at fixed sizes
CONFIG (config.yaml): report: title: "Business Report" theme: "corporate" logo_url: "" # optional company logo URL analysis: date_column: "auto" # auto-detect or specify column name primary_metric: "auto" # auto-detect or specify column name outlier_threshold: 2.0 # standard deviations correlation_threshold: 0.7 charts: enabled: true types: ["bar", "line", "pie", "histogram", "heatmap", "scatter"] color_palette: ["#3b82f6", "#10b981", "#f59e0b", "#ef4444", "#8b5cf6", "#ec4899", "#06b6d4", "#84cc16"] tables: max_rows: 10 format_currency: true currency_symbol: "$"
Generate all files with complete, working implementations. Include the sample CSVwith realistic data. The tool should produce a polished report on the first run.The prompt includes specific instructions for the sample CSV because realistic data produces realistic reports. If the sample data is random noise, the auto-generated findings will be meaningless. The seasonal patterns and regional differences ensure the analyzer has real trends to detect and report on.
What you get
After the LLM generates the project, set it up:
cd report-generatorpython -m venv .venvsource .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -e .Then generate your first report:
python -m report_generator --input sample_data/quarterly_sales.csv --output report.html --title "Q4 2024 Sales Performance Report"Expected output
report.html (~800-1200 lines of styled HTML with embedded Plotly charts)Open report.html in your browser. You should see:
- Title page with “Q4 2024 Sales Performance Report”, the generation date, and a data summary showing 60 rows and 8 columns.
- Executive summary with findings like “West region generated the highest total revenue at $X” and “Revenue increased 15% from Q1 to Q4, driven primarily by Electronics.”
- KPI cards showing total revenue, average customer satisfaction, total units sold, and the top-performing region.
- Charts: a horizontal bar chart of revenue by category, a line chart showing monthly revenue trends, a pie chart of regional contribution, and a correlation heatmap.
- Data tables with summary statistics, top 10 transactions, and a category breakdown pivot table.
- Print-ready: press Ctrl+P (or Cmd+P on Mac) and the print preview should show clean page breaks, no interactive elements, and properly sized charts.
The executive summary and key findings are generated by code, not by human analysis. Always spot-check the numbers against the raw data — automated aggregations can misidentify trends, miscalculate percentages, or flag misleading patterns (such as a 250% increase driven by a partial first month). Treat AI-generated insights as a starting point for your analysis, not a finished product.
Plotly embeds the full Plotly.js library in the HTML file (about 3 MB). This makes the report self-contained and interactive, but the file will be large. If you need a smaller file, use the --no-charts flag for a text-only report, or add a follow-up prompt to use Chart.js instead (smaller library, less interactivity).
When things go wrong
Python CLI tools introduce a new category of issues: dependency management, file I/O, and data parsing errors. Here is how to diagnose the most common problems.
When Things Go Wrong
Use the Symptom → Evidence → Request pattern: describe what you see, paste the error, then ask for a fix.
How it works
The report generator follows a pipeline pattern common in data engineering:
-
CLI (
cli.py) parses arguments with argparse and orchestrates the pipeline. It loads the config, reads the CSV into a Pandas DataFrame, passes it through the analyzer, chart builder, and report builder in sequence. -
Analyzer (
analyzer.py) uses Pandas to compute descriptive statistics (df.describe()), detect column types (df.dtypes), find correlations (df.corr()), and identify outliers (values beyondmean +/- threshold * std). The key findings are generated by comparing aggregated values: group by each categorical column, compute means, and report the highest/lowest. -
Chart Builder (
chart_builder.py) uses Plotly Express and Plotly Graph Objects to create each chart. Each chart is rendered as an HTML div string usingplotly.io.to_html(fig, include_plotlyjs=False, full_html=False). Only one copy of Plotly.js is included in the final report (in the template head). -
Report Builder (
report_builder.py) loads the Jinja2 template, fills in all the analysis results and chart divs, and writes the final HTML file. Jinja2 handles the looping (iterating over charts, table rows, findings) and conditional logic (showing data quality warnings only if issues exist). -
Config (
config.py) loads the YAML file and merges it with defaults. This means the tool works with zero configuration but can be customized by editing the YAML.
Jinja2 is the same template engine used by Flask, Django, and Ansible. Learning it here transfers directly to web development and DevOps. The template file (report.html) contains HTML with special tags like {{ title }} for variable insertion and {% for finding in findings %} for loops. The Python code passes a dictionary of values to the template, and Jinja2 renders the final HTML. This separation of logic (Python) and presentation (HTML template) is a fundamental design pattern in software engineering.
🔍From Manual Reports to Automated Pipelines: The Business Case
The report generator you just built automates a single step: turning a CSV into a formatted report. In enterprise settings, this is part of a larger reporting pipeline that runs automatically:
- Data extraction: A scheduled script pulls data from a database, API, or file share (e.g., a nightly SQL query against the sales database).
- Data transformation: The raw data is cleaned, aggregated, and enriched (e.g., joining sales data with customer demographics).
- Report generation: Your tool runs, producing the HTML or PDF report.
- Distribution: The report is emailed to stakeholders, uploaded to SharePoint, or published to a dashboard portal.
- Archival: Previous reports are stored with timestamps for historical comparison.
Tools used in industry for each step:
- Extraction: Python scripts, SSIS (SQL Server), Informatica, Fivetran
- Transformation: pandas, dbt, SQL stored procedures, Power Query
- Report generation: Python + Jinja2 (what you built), R Markdown, Tableau Server, Power BI Service
- Distribution: smtplib (Python email), Microsoft Power Automate, Airflow
- Archival: Amazon S3, Azure Blob Storage, network file shares
The ROI calculation: If a report takes 2 hours to produce manually and is generated weekly for 10 clients, that is 20 hours/week or ~1,000 hours/year. At a loaded cost of $50/hour for an analyst, that is $50,000/year spent on formatting. Your automation reduces it to the time it takes to run one command per client — about 10 minutes/week total. The savings fund the automation effort many times over.
This is the kind of analysis MIS graduates are expected to make: not just “can I build this?” but “should we build this, and what is the business impact?”
Customize it
Add comparison reports
Add a --compare flag that accepts a second CSV file. When set, generate a comparisonreport that shows side-by-side analysis: this period vs last period. For each numericmetric, show the absolute change and percentage change. Color-code increases in greenand decreases in red. Add delta charts (bar charts showing the difference betweenperiods). The executive summary should focus on what changed and why it matters.Add PDF direct output
WeasyPrint requires OS-level C libraries (GTK3, Pango) that do not install via pip alone on Windows. If you are on Windows, either use WSL (recommended) or stick with the browser’s Ctrl+P “Print to PDF” feature. On macOS and Linux, pip install weasyprint usually works directly.
Add a --pdf flag that outputs a PDF file directly instead of HTML. Use the weasyprintlibrary. The PDF should have proper page numbers, headers with the report title oneach page, a table of contents with page numbers, and charts rendered as staticimages. Add weasyprint to requirements.txt. Fall back to HTML output if weasyprintis not installed.Add natural language customization
Add a --focus flag that accepts a plain English instruction like --focus "Focus onthe West region and compare Q3 to Q4 for Electronics only". The analyzer shouldfilter the data according to the instruction and adjust the executive summary andcharts accordingly. Use simple keyword parsing: look for region names, date ranges,and category names in the instruction and apply pandas filters.Add scheduled reports
Add a --schedule flag that sets up a cron job (Linux) or scheduled task (Windows)to regenerate the report daily/weekly/monthly. The tool should:1. Check if the input CSV has changed since the last run (compare file hash)2. Only regenerate if data changed3. Optionally email the report using smtplib (--email flag)4. Log all scheduled runs to a history filePrint the cron expression to stdout so the user can review it before activating.What you just built is a Robotic Process Automation (RPA) workflow in miniature. In business and information systems coursework, you learn about automating repetitive business processes to improve efficiency and reduce errors. This report generator takes a task that manually takes 2-3 hours (data analysis, chart creation, formatting, writing summaries) and reduces it to a 10-second command. Multiply that across an organization producing weekly reports for 50 clients, and you have saved 100-150 hours per week. That is the business case for automation — and you can now build it.
The story so far
Across four lessons, you have built:
| Lesson | Tool | Technology | MIS Application |
|---|---|---|---|
| 1 | Business Analytics Dashboard | Single HTML + Chart.js | Data exploration, BI |
| 2 | Database Schema Designer | React + Vite | Database design, data modeling |
| 3 | Project Management Tracker | React + Vite | Project management, Agile |
| 4 | Business Report Generator | Python CLI | Reporting, automation |
Each one demonstrates a different technical skill (front-end, full-stack, back-end automation) applied to a real MIS domain. But the report generator raises a question: where does the CSV come from? In the next two lessons, you will build the data plumbing (ETL pipeline) and the automation layer (scheduled orchestrator) that make these tools production-ready.
Try it yourself
- Generate the report tool with the prompt above.
- Run it on the included sample data and open the HTML report.
- Read the executive summary. Does it accurately describe the data? If a finding is wrong, look at the analyzer code and figure out why.
- Try the different themes:
--theme corporate,--theme modern,--theme minimal. Print each to PDF and compare. - Now find a real CSV from your coursework — any dataset from an MIS or statistics class. Run the tool on it. How good is the auto-generated analysis?
- Edit the
config.yamlto change the color palette and currency symbol. Re-run and see the changes. - Pick one customization from the list above and add it with a follow-up prompt.
Key Takeaways
- Automation is the highest-value MIS skill. Taking a 2-hour manual process and reducing it to a 10-second command is exactly what MIS professionals are hired to do. This report generator is a concrete example you can point to.
- The pipeline pattern (input, process, output) applies everywhere. CLI arguments define the input, the analyzer and chart builder do the processing, and the Jinja2 template produces the output. This is the same pattern used in ETL pipelines, CI/CD systems, and data engineering workflows.
- Configuration separates policy from mechanism. The YAML config file lets you change the report’s appearance, metric thresholds, and chart types without touching the code. This is a fundamental principle in systems design.
- Print-ready HTML is a powerful output format. Unlike Word documents (hard to generate programmatically) or raw PDFs (complex libraries), HTML with print CSS gives you interactive reports that also look professional when printed. One format, two use cases.
- Real-world data is messy. The sample CSV works perfectly, but your coursework CSVs will have missing values, inconsistent formatting, and mixed data types. Learning to handle these edge cases (see the troubleshooting section) is the difference between a demo and a real tool.
Portfolio Suggestion
The report generator is a strong portfolio piece on its own. Include 2-3 sample reports generated from different datasets (sales data, survey results, financial data) and commit the HTML reports so visitors can download and open them. The full portfolio strategy — combining all six MIS tools into a single showcase repository — is covered in Lesson 6 after you complete the entire module.
You run the report generator on a CSV from your statistics class. The executive summary says 'Revenue showed a 250% increase from January to December.' You check the CSV and see that January had $200 in revenue (a partial month when the business launched) and December had $700. The finding is technically correct but misleading. What is the best approach?
What’s next
In the next lesson, you will build an Automated ETL Pipeline — a Python CLI tool that extracts data from multiple source formats (CSV, JSON, API), applies SQL transformations (JOINs, aggregations, deduplication), and loads clean data into a SQLite database. It is the data plumbing that feeds the dashboards and reports you have already built.