RA101 — Data Mart Architecture Diagram

Data Marts in the Modern Warehouse / Lakehouse

Subject‑oriented slices (Sales, Pricing, Inventory, Procurement) curated from governed enterprise data.

Sources / Consumption context

Pipelines & Medallion layers

Enterprise DW / Lakehouse (light blue)

Data Marts (subject areas)

Standalone HTML: paste into a page as-is. Fonts and styles are self-contained.

Explanation — Data Mart Architecture (Sales Example)

How the Data Mart Architecture Works — Sales Example

The schematic illustrates how data flows through the layers of a modern Data Warehouse and Lakehouse ecosystem. To make it relatable, let’s walk through the journey of sales data from source to insight.

1. Source Systems — Capturing Sales Transactions

Everything begins at the source. For a retail or FMCG company, sales transactions are generated daily from multiple channels: Point‑of‑Sale (POS) systems in stores, e‑commerce platforms, and customer relationship management (CRM) systems tracking orders and feedback. Each source produces structured or semi‑structured data—invoice numbers, product IDs, timestamps, store locations, and customer details.

These diverse data points are extracted and prepared for ingestion into the data pipeline.

2. Ingest & Orchestration — Moving the Data

In this layer, automated workflows handle the transfer of data from the sources into the data warehouse environment. Tools such as Airflow, dbt, or Fivetran schedule and monitor extraction, ensuring daily sales transactions arrive reliably.

The orchestration process also applies basic validation (for example, checking that all store branches submitted data) before the raw data is placed into the first storage layer.

3. Bronze Layer — Raw Sales Data

The Bronze (Raw) layer is a landing zone where data is stored as‑is. Nothing is altered or cleaned yet. It includes duplicate sales entries, incomplete customer fields, or missing stock codes. The purpose of this stage is traceability—you can always return to the original data to verify accuracy.

For example, a raw sales record might show “Product A” sold for R120 but missing the store location or salesperson ID. That’s fine; it’s captured for later enrichment.

4. Silver Layer — Cleansed and Conformed

In the Silver (Cleansed) layer, data engineers standardize formats, remove duplicates, and enrich the dataset with reference tables such as store codes, product hierarchies, and region mappings. All records now have consistent columns and valid values.

At this stage, “Product A” is linked to its proper product category (e.g., Home Cleaning) and store region (e.g., Gauteng North), giving analysts a uniform view across all sales channels.

5. Gold Layer — Curated Business‑Ready Data

The Gold (Curated) layer aggregates the cleansed data into meaningful business facts. For example, total daily sales, gross profit per product category, or average discount by region. It’s here that KPIs such as Revenue Growth, Average Basket Size, and Top‑Selling Products are computed.

This layer serves as the foundation for departmental Data Marts and executive dashboards.

6. Enterprise Data Warehouse / Lakehouse

Once transformed, the integrated data flows into the Enterprise Data Warehouse (EDW) or Lakehouse. This is the central hub that unifies all subject areas—Sales, Inventory, Pricing, and Procurement—under consistent governance and security. It ensures that every department works from a single version of the truth.

In our sales scenario, this means the finance team, marketing analysts, and store managers all view identical sales figures, regardless of which tool or dashboard they use.

7. Data Marts — Focused Departmental Views

From the Enterprise layer, specific subsets of data are carved out into Data Marts. The Sales Mart is one such example. It contains curated sales measures—like sales quantity, revenue, profit margin, and promotional performance—optimized for quick reporting and departmental use.

For instance, the Pricing Mart focuses on elasticity analysis, while the Inventory Mart links sales velocity with stock availability. Each mart is tailored to the analytics needs of that department but draws its truth from the governed enterprise warehouse.

8. Consumption Layer — Insights and Decisions

Finally, users interact with the data through BI and AI tools such as Power BI, Tableau, Streamlit apps, or API‑driven dashboards. In the Sales Mart example, a store manager could open a dashboard showing yesterday’s top‑selling SKUs, low‑margin items, or price sensitivity curves.

This layer closes the loop by transforming data into actionable intelligence—helping the business decide which products to restock, where to adjust pricing, and which regions to target for promotions.

Summary

The schematic represents a structured journey from raw operational data to strategic insight. Each layer adds value, context, and reliability. By the time data reaches the Sales Mart, it’s accurate, relevant, and instantly usable by decision‑makers—turning data warehousing into a true sales intelligence engine.

9. AI & Advanced Analytics on Top of the Sales Mart

The diagram also supports an AI layer. Even if live data isn’t connected yet, the architecture makes it straightforward to extract ABC/XYZ features, elasticity-ready datasets, and statistical signals from the Silver and Gold layers into the Sales Mart for modeling.

9.1 Feature Extraction (ABC / XYZ)

ABC classification (value contribution): rank SKUs by revenue or margin over a horizon (e.g., last 90 or 365 days) and segment using Pareto cutoffs (A ≈ top 70–80% of value, B ≈ next 15–25%, C ≈ remainder). This uses Gold layer facts such as sales_amount × margin% per SKU.

XYZ classification (demand variability): compute the coefficient of variation (CV = σ/μ) of weekly or daily demand. X = stable (low CV), Y = medium variability, Z = erratic (high CV). Inputs come from Silver cleansed sales quantities aggregated to a consistent cadence.

Combined ABC/XYZ (e.g., AX, BY, CZ) guides stocking, safety stock, and service levels. The features are materialized as columns in the Sales Mart for easy filtering and policy rules.

9.2 Price Elasticity of Demand

From the Sales Mart, construct panels of price, quantity, promo flags, competitor_price, seasonality, store/region. Typical modeling options:

Log-log regression: ln(Q) = β₀ + β₁ ln(P) + controls + ε → elasticity ≈ β₁.
Hierarchical/Bayesian models: partial pooling across stores/regions/SKUs to stabilize estimates for low-volume items.
Discrete choice (mixed logit): model substitution and cross-price effects within a category.
Causal uplift / A/B or diff-in-diff: estimate promotional lift versus counterfactual baselines.

Outputs written back to the Sales or Pricing Mart include: own-price elasticity, cross-price elasticities, promo uplift, and elasticity confidence intervals, enabling pricing simulators and what-if analyses.

9.3 Forecasting & Inventory Intelligence

Time-series forecasting: per SKU×Store using ARIMA/ETS/Prophet or ML regressors with holiday/seasonality features from the Silver layer.
Decomposition: trend/seasonality/residuals to detect structural breaks after price changes or promotions.
Safety stock / reorder points: service-level based calculations using lead-time demand and XYZ class variability.
Anomaly detection: flag outliers (e.g., phantom sales, duplicate loads, scanner errors) via robust Z-scores or isolation forests.

9.4 Customer & Basket Analytics

RFM segmentation: Recency–Frequency–Monetary scores from Gold sales facts for targeted campaigns.
Market-basket analysis: association rules (support, confidence, lift) to design cross-sell bundles.
CLV (Customer Lifetime Value): probabilistic models combining repeat purchase rates and contribution margins.
NLP on feedback: topic extraction and sentiment from CRM tickets and reviews (Silver text features).

9.5 Statistical Signals the Mart Can Expose

Descriptives: mean, median, percentiles, variance, standard deviation, coefficient of variation.
Reliability & uncertainty: confidence intervals, standard errors, bootstrapped intervals for KPIs.
Time-series metrics: ACF/PACF, seasonal indices, MAPE/WAPE/MAE/RMSE for forecast accuracy.
Model diagnostics: R²/adjusted R², p-values, multicollinearity (VIF), residual tests (normality, heteroskedasticity).
Causal impact: diff-in-diff estimates, uplift scores, synthetic control summaries for major promos or price changes.

9.6 Where the Data Comes From (Given No Live Feed Yet)

Even without current data, the extraction plan aligns to the schematic:

Silver layer → cleansed sales transactions, product/store calendars, competitor price snapshots, promo flags.
Gold layer → aggregated facts (daily/weekly sales, margin, inventory positions), KPI tables, calendar/holiday dimensions.
Sales & Pricing Marts → materialize ABC/XYZ, elasticity-ready panels, forecast features, and model outputs as shareable views.

9.7 Example Columns to Materialize

sku_id, store_id, week, price, quantity, promo_flag, competitor_price, category, region
abc_class (A/B/C), xyz_class (X/Y/Z via CV bands), elasticity, elasticity_lo, elasticity_hi
forecast, forecast_lo, forecast_hi, mape, rmse
uplift_pct, cross_price_elasticity, basket_lift

9.8 Governance & Serving

Models read from the marts and write results back as new tables/views with row‑level security for teams (e.g., Stores only see their SKUs). BI tools and APIs consume these outputs to support pricing playbooks, replenishment, and promotion design.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.