RA101 — Data Mart Architecture Diagram

Data Marts in the Modern Warehouse / Lakehouse

Subject‑oriented slices (Sales, Pricing, Inventory, Procurement) curated from governed enterprise data.

Source Systems ERP • POS • CRM • E‑commerce • Clickstreams • Spreadsheets • 3rd‑party data Ingest & Orchestration Batch • Streaming • CDC • Schedules Tools: Airflow, dbt, Fivetran, Kafka Bronze (Raw) Landing zone with minimal transformation Stores unfiltered source data for auditing Silver (Cleansed) Data cleaned, conformed and validated Used for analytics and model training Gold (Curated) Business‑ready models and aggregated facts Supports dashboards and KPI reporting Enterprise Data Warehouse / Lakehouse Central repository integrating all subject areas Subject‑oriented • Integrated • Time‑variant • Non‑volatile Governance • Lineage • Security • Platforms: Snowflake • BigQuery • Redshift • Databricks • PostgreSQL Data Marts (Subject‑Oriented Slices) Provides departmental data subsets for faster decision‑making and focused analytics. Examples: Sales, Pricing, Inventory, Procurement. Sales Mart Pricing Mart Inventory Mart Procurement Mart Consumption: BI • Reports • APIs • ML/AI (Notebooks, Apps, Streamlit)
Sources / Consumption context
Pipelines & Medallion layers
Enterprise DW / Lakehouse (light blue)
Data Marts (subject areas)

Standalone HTML: paste into a page as-is. Fonts and styles are self-contained.

Explanation — Data Mart Architecture (Sales Example)

How the Data Mart Architecture Works — Sales Example

The schematic illustrates how data flows through the layers of a modern Data Warehouse and Lakehouse ecosystem. To make it relatable, let’s walk through the journey of sales data from source to insight.

1. Source Systems — Capturing Sales Transactions

Everything begins at the source. For a retail or FMCG company, sales transactions are generated daily from multiple channels: Point‑of‑Sale (POS) systems in stores, e‑commerce platforms, and customer relationship management (CRM) systems tracking orders and feedback. Each source produces structured or semi‑structured data—invoice numbers, product IDs, timestamps, store locations, and customer details.

These diverse data points are extracted and prepared for ingestion into the data pipeline.

2. Ingest & Orchestration — Moving the Data

In this layer, automated workflows handle the transfer of data from the sources into the data warehouse environment. Tools such as Airflow, dbt, or Fivetran schedule and monitor extraction, ensuring daily sales transactions arrive reliably.

The orchestration process also applies basic validation (for example, checking that all store branches submitted data) before the raw data is placed into the first storage layer.

3. Bronze Layer — Raw Sales Data

The Bronze (Raw) layer is a landing zone where data is stored as‑is. Nothing is altered or cleaned yet. It includes duplicate sales entries, incomplete customer fields, or missing stock codes. The purpose of this stage is traceability—you can always return to the original data to verify accuracy.

For example, a raw sales record might show “Product A” sold for R120 but missing the store location or salesperson ID. That’s fine; it’s captured for later enrichment.

4. Silver Layer — Cleansed and Conformed

In the Silver (Cleansed) layer, data engineers standardize formats, remove duplicates, and enrich the dataset with reference tables such as store codes, product hierarchies, and region mappings. All records now have consistent columns and valid values.

At this stage, “Product A” is linked to its proper product category (e.g., Home Cleaning) and store region (e.g., Gauteng North), giving analysts a uniform view across all sales channels.

5. Gold Layer — Curated Business‑Ready Data

The Gold (Curated) layer aggregates the cleansed data into meaningful business facts. For example, total daily sales, gross profit per product category, or average discount by region. It’s here that KPIs such as Revenue Growth, Average Basket Size, and Top‑Selling Products are computed.

This layer serves as the foundation for departmental Data Marts and executive dashboards.

6. Enterprise Data Warehouse / Lakehouse

Once transformed, the integrated data flows into the Enterprise Data Warehouse (EDW) or Lakehouse. This is the central hub that unifies all subject areas—Sales, Inventory, Pricing, and Procurement—under consistent governance and security. It ensures that every department works from a single version of the truth.

In our sales scenario, this means the finance team, marketing analysts, and store managers all view identical sales figures, regardless of which tool or dashboard they use.

7. Data Marts — Focused Departmental Views

From the Enterprise layer, specific subsets of data are carved out into Data Marts. The Sales Mart is one such example. It contains curated sales measures—like sales quantity, revenue, profit margin, and promotional performance—optimized for quick reporting and departmental use.

For instance, the Pricing Mart focuses on elasticity analysis, while the Inventory Mart links sales velocity with stock availability. Each mart is tailored to the analytics needs of that department but draws its truth from the governed enterprise warehouse.

8. Consumption Layer — Insights and Decisions

Finally, users interact with the data through BI and AI tools such as Power BI, Tableau, Streamlit apps, or API‑driven dashboards. In the Sales Mart example, a store manager could open a dashboard showing yesterday’s top‑selling SKUs, low‑margin items, or price sensitivity curves.

This layer closes the loop by transforming data into actionable intelligence—helping the business decide which products to restock, where to adjust pricing, and which regions to target for promotions.

Summary

The schematic represents a structured journey from raw operational data to strategic insight. Each layer adds value, context, and reliability. By the time data reaches the Sales Mart, it’s accurate, relevant, and instantly usable by decision‑makers—turning data warehousing into a true sales intelligence engine.

9. AI & Advanced Analytics on Top of the Sales Mart

The diagram also supports an AI layer. Even if live data isn’t connected yet, the architecture makes it straightforward to extract ABC/XYZ features, elasticity-ready datasets, and statistical signals from the Silver and Gold layers into the Sales Mart for modeling.

9.1 Feature Extraction (ABC / XYZ)

ABC classification (value contribution): rank SKUs by revenue or margin over a horizon (e.g., last 90 or 365 days) and segment using Pareto cutoffs (A ≈ top 70–80% of value, B ≈ next 15–25%, C ≈ remainder). This uses Gold layer facts such as sales_amount × margin% per SKU.

XYZ classification (demand variability): compute the coefficient of variation (CV = σ/μ) of weekly or daily demand. X = stable (low CV), Y = medium variability, Z = erratic (high CV). Inputs come from Silver cleansed sales quantities aggregated to a consistent cadence.

Combined ABC/XYZ (e.g., AX, BY, CZ) guides stocking, safety stock, and service levels. The features are materialized as columns in the Sales Mart for easy filtering and policy rules.

9.2 Price Elasticity of Demand

From the Sales Mart, construct panels of price, quantity, promo flags, competitor_price, seasonality, store/region. Typical modeling options:

Outputs written back to the Sales or Pricing Mart include: own-price elasticity, cross-price elasticities, promo uplift, and elasticity confidence intervals, enabling pricing simulators and what-if analyses.

9.3 Forecasting & Inventory Intelligence

9.4 Customer & Basket Analytics

9.5 Statistical Signals the Mart Can Expose

9.6 Where the Data Comes From (Given No Live Feed Yet)

Even without current data, the extraction plan aligns to the schematic:

  1. Silver layer → cleansed sales transactions, product/store calendars, competitor price snapshots, promo flags.
  2. Gold layer → aggregated facts (daily/weekly sales, margin, inventory positions), KPI tables, calendar/holiday dimensions.
  3. Sales & Pricing Marts → materialize ABC/XYZ, elasticity-ready panels, forecast features, and model outputs as shareable views.

9.7 Example Columns to Materialize

9.8 Governance & Serving

Models read from the marts and write results back as new tables/views with row‑level security for teams (e.g., Stores only see their SKUs). BI tools and APIs consume these outputs to support pricing playbooks, replenishment, and promotion design.