Data Marts in the Modern Warehouse / Lakehouse
Subject‑oriented slices (Sales, Pricing, Inventory, Procurement) curated from governed enterprise data.
Standalone HTML: paste into a page as-is. Fonts and styles are self-contained.
How the Data Mart Architecture Works — Sales Example
The schematic illustrates how data flows through the layers of a modern Data Warehouse and Lakehouse ecosystem. To make it relatable, let’s walk through the journey of sales data from source to insight.
1. Source Systems — Capturing Sales Transactions
Everything begins at the source. For a retail or FMCG company, sales transactions are generated daily from multiple channels: Point‑of‑Sale (POS) systems in stores, e‑commerce platforms, and customer relationship management (CRM) systems tracking orders and feedback. Each source produces structured or semi‑structured data—invoice numbers, product IDs, timestamps, store locations, and customer details.
These diverse data points are extracted and prepared for ingestion into the data pipeline.
2. Ingest & Orchestration — Moving the Data
In this layer, automated workflows handle the transfer of data from the sources into the data warehouse environment. Tools such as Airflow, dbt, or Fivetran schedule and monitor extraction, ensuring daily sales transactions arrive reliably.
The orchestration process also applies basic validation (for example, checking that all store branches submitted data) before the raw data is placed into the first storage layer.
3. Bronze Layer — Raw Sales Data
The Bronze (Raw) layer is a landing zone where data is stored as‑is. Nothing is altered or cleaned yet. It includes duplicate sales entries, incomplete customer fields, or missing stock codes. The purpose of this stage is traceability—you can always return to the original data to verify accuracy.
For example, a raw sales record might show “Product A” sold for R120 but missing the store location or salesperson ID. That’s fine; it’s captured for later enrichment.
4. Silver Layer — Cleansed and Conformed
In the Silver (Cleansed) layer, data engineers standardize formats, remove duplicates, and enrich the dataset with reference tables such as store codes, product hierarchies, and region mappings. All records now have consistent columns and valid values.
At this stage, “Product A” is linked to its proper product category (e.g., Home Cleaning) and store region (e.g., Gauteng North), giving analysts a uniform view across all sales channels.
5. Gold Layer — Curated Business‑Ready Data
The Gold (Curated) layer aggregates the cleansed data into meaningful business facts. For example, total daily sales, gross profit per product category, or average discount by region. It’s here that KPIs such as Revenue Growth, Average Basket Size, and Top‑Selling Products are computed.
This layer serves as the foundation for departmental Data Marts and executive dashboards.
6. Enterprise Data Warehouse / Lakehouse
Once transformed, the integrated data flows into the Enterprise Data Warehouse (EDW) or Lakehouse. This is the central hub that unifies all subject areas—Sales, Inventory, Pricing, and Procurement—under consistent governance and security. It ensures that every department works from a single version of the truth.
In our sales scenario, this means the finance team, marketing analysts, and store managers all view identical sales figures, regardless of which tool or dashboard they use.
7. Data Marts — Focused Departmental Views
From the Enterprise layer, specific subsets of data are carved out into Data Marts. The Sales Mart is one such example. It contains curated sales measures—like sales quantity, revenue, profit margin, and promotional performance—optimized for quick reporting and departmental use.
For instance, the Pricing Mart focuses on elasticity analysis, while the Inventory Mart links sales velocity with stock availability. Each mart is tailored to the analytics needs of that department but draws its truth from the governed enterprise warehouse.
8. Consumption Layer — Insights and Decisions
Finally, users interact with the data through BI and AI tools such as Power BI, Tableau, Streamlit apps, or API‑driven dashboards. In the Sales Mart example, a store manager could open a dashboard showing yesterday’s top‑selling SKUs, low‑margin items, or price sensitivity curves.
This layer closes the loop by transforming data into actionable intelligence—helping the business decide which products to restock, where to adjust pricing, and which regions to target for promotions.
Summary
The schematic represents a structured journey from raw operational data to strategic insight. Each layer adds value, context, and reliability. By the time data reaches the Sales Mart, it’s accurate, relevant, and instantly usable by decision‑makers—turning data warehousing into a true sales intelligence engine.
9. AI & Advanced Analytics on Top of the Sales Mart
The diagram also supports an AI layer. Even if live data isn’t connected yet, the architecture makes it straightforward to extract ABC/XYZ features, elasticity-ready datasets, and statistical signals from the Silver and Gold layers into the Sales Mart for modeling.
9.1 Feature Extraction (ABC / XYZ)
ABC classification (value contribution): rank SKUs by revenue or margin over a horizon (e.g., last 90 or 365 days) and segment using Pareto cutoffs (A ≈ top 70–80% of value, B ≈ next 15–25%, C ≈ remainder). This uses Gold layer facts such as sales_amount × margin% per SKU.
XYZ classification (demand variability): compute the coefficient of variation (CV = σ/μ) of weekly or daily demand. X = stable (low CV), Y = medium variability, Z = erratic (high CV). Inputs come from Silver cleansed sales quantities aggregated to a consistent cadence.
Combined ABC/XYZ (e.g., AX, BY, CZ) guides stocking, safety stock, and service levels. The features are materialized as columns in the Sales Mart for easy filtering and policy rules.
9.2 Price Elasticity of Demand
From the Sales Mart, construct panels of price, quantity, promo flags, competitor_price, seasonality, store/region. Typical modeling options:
- Log-log regression: ln(Q) = β₀ + β₁ ln(P) + controls + ε → elasticity ≈ β₁.
- Hierarchical/Bayesian models: partial pooling across stores/regions/SKUs to stabilize estimates for low-volume items.
- Discrete choice (mixed logit): model substitution and cross-price effects within a category.
- Causal uplift / A/B or diff-in-diff: estimate promotional lift versus counterfactual baselines.
Outputs written back to the Sales or Pricing Mart include: own-price elasticity, cross-price elasticities, promo uplift, and elasticity confidence intervals, enabling pricing simulators and what-if analyses.
9.3 Forecasting & Inventory Intelligence
- Time-series forecasting: per SKU×Store using ARIMA/ETS/Prophet or ML regressors with holiday/seasonality features from the Silver layer.
- Decomposition: trend/seasonality/residuals to detect structural breaks after price changes or promotions.
- Safety stock / reorder points: service-level based calculations using lead-time demand and XYZ class variability.
- Anomaly detection: flag outliers (e.g., phantom sales, duplicate loads, scanner errors) via robust Z-scores or isolation forests.
9.4 Customer & Basket Analytics
- RFM segmentation: Recency–Frequency–Monetary scores from Gold sales facts for targeted campaigns.
- Market-basket analysis: association rules (support, confidence, lift) to design cross-sell bundles.
- CLV (Customer Lifetime Value): probabilistic models combining repeat purchase rates and contribution margins.
- NLP on feedback: topic extraction and sentiment from CRM tickets and reviews (Silver text features).
9.5 Statistical Signals the Mart Can Expose
- Descriptives: mean, median, percentiles, variance, standard deviation, coefficient of variation.
- Reliability & uncertainty: confidence intervals, standard errors, bootstrapped intervals for KPIs.
- Time-series metrics: ACF/PACF, seasonal indices, MAPE/WAPE/MAE/RMSE for forecast accuracy.
- Model diagnostics: R²/adjusted R², p-values, multicollinearity (VIF), residual tests (normality, heteroskedasticity).
- Causal impact: diff-in-diff estimates, uplift scores, synthetic control summaries for major promos or price changes.
9.6 Where the Data Comes From (Given No Live Feed Yet)
Even without current data, the extraction plan aligns to the schematic:
- Silver layer → cleansed sales transactions, product/store calendars, competitor price snapshots, promo flags.
- Gold layer → aggregated facts (daily/weekly sales, margin, inventory positions), KPI tables, calendar/holiday dimensions.
- Sales & Pricing Marts → materialize ABC/XYZ, elasticity-ready panels, forecast features, and model outputs as shareable views.
9.7 Example Columns to Materialize
sku_id,store_id,week,price,quantity,promo_flag,competitor_price,category,regionabc_class(A/B/C),xyz_class(X/Y/Z via CV bands),elasticity,elasticity_lo,elasticity_hiforecast,forecast_lo,forecast_hi,mape,rmseuplift_pct,cross_price_elasticity,basket_lift
9.8 Governance & Serving
Models read from the marts and write results back as new tables/views with row‑level security for teams (e.g., Stores only see their SKUs). BI tools and APIs consume these outputs to support pricing playbooks, replenishment, and promotion design.