Revenue Forecasting FP&A Thermo Fisher Scientific SEC EDGAR · 50+ quarters

Building a FP&A Intelligence Dashboard
from scratch, in one session.

Five forecasting models. Live SEC EDGAR data. GenAI narrative. Scenario planning. Server-side caching. Deployed to the cloud. This is the complete build log — including every decision made, every bug hit, and the role an AI played in making it happen at this pace.

5

Forecasting models

5

Dashboard pages

50+

Quarters of data

0

API keys required for data

View live dashboard ↗ GitHub ↗

01

01 The Problem

FP&A work — financial planning and analysis — typically lives in Excel. Revenue forecasts are spreadsheets. Variance reports are manual. Scenario planning is a series of copied tabs with adjusted assumptions. The problem isn't the analysis itself. It's that the infrastructure makes the analysis brittle, slow, and invisible to anyone who isn't the person who built the file.

The target company

Thermo Fisher Scientific (TMO) — a $40B life sciences and analytical instruments company, publicly traded on the NYSE. Chosen deliberately: it's a complex, multi-segment business with genuine quarterly seasonality, making it a realistic forecasting challenge, not a toy problem.

Every number in this dashboard comes from TMO's own SEC filings — 10-Q and 10-K reports submitted quarterly. No data vendors, no subscriptions, no scraping. The SEC makes this available free through their XBRL API.

What a real FP&A analyst needs

Quarterly actuals going back 10+ years — enough history for seasonal models
Forward-looking forecasts with uncertainty bands, not just point estimates
Variance analysis — where did actuals deviate from forecast, and by how much?
Scenario planning — what does Q4 look like if macro conditions shift by ±8%?
Commentary that explains the numbers in plain English

"The difference between a forecasting model and a forecasting tool is whether a finance professional can use it without reading your code."

02

02 Architecture

The first decision was the framework. Streamlit is the obvious choice for Python data apps — but Streamlit reruns the entire script on every interaction, making it poorly suited for multi-page apps with expensive model computations. Dash was the right call.

Why Dash over Streamlit

Multi-page routing — Dash has a native pages system. Each page is a separate module; the router handles URL navigation cleanly.
Callback architecture — outputs only update when their specific inputs change, not when anything on the page changes.
Flask under the hood — this matters for caching: flask-caching integrates directly with the server, giving us real server-side memoization.
Production-ready — Gunicorn + Render deployment is a straightforward path. No extra orchestration needed.

Module breakdown

app.py — server init, sidebar, routing

config.py — all constants in one place

edgar.py — SEC API fetch + parquet cache

models.py — 5 forecasting models

charts.py — Plotly figure builders

narrative.py — Claude API commentary

cache.py — Flask-Caching instance

pages/ — one file per tab

The 5-page structure

Overview

KPI cards, revenue history chart, P&L waterfall, margin trends. The "one glance" page — where a finance lead starts their morning.

Forecast

All 5 models on a single chart. MAPE accuracy bar. AI-generated commentary. Switch metric via dropdown — Revenue, Gross Profit, Operating Income, Net Income.

Variance

Actual vs ensemble forecast. Quarters where variance exceeds ±5% are flagged automatically. The accountability layer.

Scenarios

Base / Optimistic (+8%) / Pessimistic (−8%) applied to the ensemble. 4-quarter total projections with a quarter-by-quarter table.

Data

Full data catalogue — source provenance, XBRL key mapping, field dictionary, quality diagnostics, and the raw DataFrame. The transparency layer.

03

03 Data Pipeline

The most underappreciated part of the build. Getting clean, correctly-labelled quarterly financial data from a public source — without paying for a data vendor — took real engineering.

SEC EDGAR XBRL API

The SEC requires all public US companies to file financial statements in XBRL (eXtensible Business Reporting Language) — a structured format that maps each line item to a standardised taxonomy key. The API endpoint is public, free, and returns a decade of data in a single JSON response:

GET https://data.sec.gov/api/xbrl/companyfacts/CIK{CIK}.json

No authentication. No rate limit beyond a standard 10 req/s. Thermo Fisher's CIK is 0000097476. One call gets everything.

The XBRL deduplication problem

Companies don't always use the same taxonomy key across years. TMO's revenue appears under three different XBRL keys across different filing periods. The pipeline tries each in order and takes the first match:

RevenueFromContractWithCustomer...
→ Revenues
→ SalesRevenueNet

After extraction, the data is deduplicated by period-end date — keeping the latest filing for each quarter — then cached as Parquet for fast subsequent loads.

"50+ quarters of clean quarterly P&L data, directly from SEC filings, at zero cost. That's the entire data layer."

04

04 Forecasting Models

The decision to run five models wasn't about complexity for its own sake. It was about honesty: no single model dominates across all financial time series. An ensemble built from disagreeing models produces more robust intervals than any model alone.

ARIMA (2,1,1)(1,1,0)₄

Classical statistical model. Captures autocorrelation in the series. The seasonal component (period=4) handles quarterly patterns. First-difference removes the revenue growth trend.

ETS Exponential Smoothing

Additive trend + additive seasonality. Weights recent observations more heavily than older ones — useful when TMO's growth rate changes after acquisitions. Optimised parameters fitted automatically.

Prophet Meta · Multiplicative

Decomposable time series model. Multiplicative seasonality — correct for a company whose seasonal swing is proportional to the level of revenue. Robust to structural breaks from M&A activity.

XGBoost Recursive multi-step

Gradient boosting on lag features [1, 2, 4, 8 quarters], quarter-of-year, and a linear trend. Forecasts recursively — each prediction becomes input for the next step. Catches non-linear patterns the statistical models miss.

Ensemble Mean + CI

Simple mean across all four models. Confidence interval is ±1.96σ of the model spread — so quarters where models disagree produce wider intervals, which is exactly the right behaviour.

Validation methodology

Each model holds out the last 4 quarters as a validation set. MAPE (Mean Absolute Percentage Error) is computed on those held-out actuals. The 5% threshold shown on the accuracy chart is the conventional FP&A standard for "forecast is useful." Models are trained on the full series before generating the forward 4-quarter forecast.

05

05 Engineering Decisions

The models are the interesting part, but the engineering is what makes it usable. Three decisions made the difference between a prototype and something production-ready.

Server-side caching

Running ARIMA, ETS, Prophet, and XGBoost on every page visit produces a 10–20 second lag — unacceptable for a dashboard. The fix: flask-caching with FileSystemCache.

Both load_financials() and run_all_forecasts() are decorated with @cache.memoize(timeout=3600). First visit computes. Every subsequent visit serves from disk in milliseconds. The "Refresh Data" button calls cache.clear() — wiping both the DataFrame and all forecast results in one shot.

FileSystemCache rather than SimpleCache because Gunicorn on Render runs multiple workers that don't share in-process memory. Filesystem is the correct shared layer.

The CSS bug that wasn't obvious

On first run, the app crashed with:

AttributeError: module 'dash.html'
has no attribute 'Style'

Dash doesn't have an html.Style component. The correct pattern — which isn't prominently documented — is that Dash auto-serves everything from an assets/ folder. Drop a CSS file there, it loads on every page, no code required. All the inline styles moved to assets/style.css and the problem disappeared.

Chart title / legend overlap

Plotly's default layout put both the chart title and horizontal legend in the same 40px top margin. They rendered on top of each other. Fix: move the legend to y=-0.12 (below the chart) and increase top margin to 80px. Applied once in _base_layout(), fixed all six charts.

GenAI narrative layer

The Forecast page includes an "AI Commentary" section — 2–3 sentences of plain-English variance analysis generated by Claude (Anthropic's claude-sonnet-4-6 model). It receives the last 4 quarters of actuals, the ensemble forecast, and the metric name, and returns a short paragraph explaining what the trend means in FP&A terms. If no API key is set, the system falls back to a deterministic rule-based summary — so the dashboard works fully without it.

06

06 Built with Claude

This entire application was built in collaboration with Claude — the AI assistant developed by Anthropic. That is not a footnote. It is a core part of what this project demonstrates.

What Claude did

Every file in this repository — edgar.py, models.py, charts.py, narrative.py, cache.py, all five page modules, the CSS, the deployment config — was written by Claude in response to an architecture conversation.

That conversation covered: the framework decision (Dash vs Streamlit), how SEC EDGAR's XBRL API works, the right approach for each forecasting model, how to structure a multi-page Dash application, why html.Style doesn't exist in Dash and what the correct pattern is, how Flask-Caching integrates with a Dash server, how to fix Plotly legend overlap, and how to write a Gunicorn start command for Render.

Bugs were diagnosed and fixed in the same session. When EDGAR returned data with inconsistent XBRL keys across years, Claude wrote the fallback logic. When the chart titles overlapped the legends, Claude identified the exact Plotly property and applied the fix across all charts via _base_layout().

What this means for how I work

The instinct here wasn't "let the AI write the code." It was: I understand what needs to be built and why. Claude understands how to build it. That division of labour — domain knowledge and product judgment on one side, implementation and debugging on the other — is genuinely productive.

I knew which forecasting models were appropriate for quarterly financial time series, and why an ensemble is more honest than a single model. I knew what a finance professional actually needs from a variance analysis page. I knew that the right data source was SEC EDGAR, not a commercial API. Claude knew the Dash multi-page pattern, the ARIMA parameters, the Flask-Caching init sequence, and how Plotly's legend positioning works.

The result is something that would have taken several days alone and took one session together.

Full transparency

Claude is also the AI whose API powers the "AI Commentary" section inside the dashboard — the same model, used twice: once as a development tool, once as a runtime feature. I wanted to document this clearly rather than obscure it. Using AI as a collaborator is a skill. Knowing when to use it, what to ask for, and how to evaluate the output is the work.

"Claude wrote the code. I knew what to build, why it mattered, and whether the output was correct. That combination is what produced the dashboard."

07

07 Result

A production-grade FP&A intelligence dashboard — deployed, cached, and sourced from real SEC filings — that demonstrates what applied financial analytics looks like when it's built as a product rather than a script.

Data layer

50+ quarters of TMO quarterly P&L from SEC EDGAR. No vendor, no cost. Parquet-cached locally. One-click refresh from the sidebar.

Forecast layer

ARIMA, ETS, Prophet, XGBoost, and Ensemble — each with validation MAPE. Confidence intervals from model disagreement. 4 metrics selectable.

Narrative layer

GenAI commentary via Claude API explaining variance in plain English. Graceful fallback to rule-based summary without an API key.

Infrastructure

Server-side Flask-Caching. Gunicorn on Render. Sub-100ms response time after first load. Git-versioned. Fully open source.

Open the dashboard ↗ View source on GitHub ↗ ← Back to portfolio

Stack: Python 3.12 · Dash 2.16 · Plotly · dash-bootstrap-components · pandas · statsmodels · Prophet · XGBoost · scikit-learn · Flask-Caching · Anthropic SDK · Gunicorn · Render
Data: SEC EDGAR XBRL API · Thermo Fisher Scientific (CIK 0000097476) · 10-Q and 10-K filings · 2012–present
AI collaborator: Claude (claude-sonnet-4-6) by Anthropic — used for both development (code generation, debugging) and as a runtime feature (FP&A narrative generation)

Building a FP&A Intelligence Dashboardfrom scratch, in one session.

Building a FP&A Intelligence Dashboard
from scratch, in one session.