Step 1: Business Context
You work for a retail company that wants to analyze sales data for multiple stores across different regions. The system tracks:
- Customers with basic info.
- Products organized by category.
- Promotions that can apply to multiple orders.
- Locations (Region → City → Store) for all inventory and sales.
- Sales Orders (fact table) with details about each transaction.
- Inventory levels in each store (a second fact table).
We’ll walk through multiple steps to create a robust dimensional model:
- Create dimension tables with surrogate keys.
- Add a Slowly Changing Dimension (SCD) for Customers.
- Create a bridge table for multi-promotion relationships.
- Handle a hierarchical dimension for Locations.
- Add a second fact table (Inventory).
- Assign data types and constraints to each column.
Step 2: Create Dimensions with Surrogate Keys
We have two main dimensions to start: dim_customers and dim_products.
Instead of using natural IDs (customer_id
, product_id
) as primary keys, we
want to introduce surrogate keys (like customer_key
and product_key
)
to ensure stability. The natural IDs remain as attributes.
Column Pool
dim_customers
dim_products
Step 3: Slowly Changing Dimension (SCD)
Our Customer dimension must track historical changes (e.g. address changes). We’ll use a
Type 2 approach, adding columns like effective_start_date
, effective_end_date
,
and a current flag.
Add these columns to dim_customers
, marking them as regular attributes (not PK).
The customer_key
remains the surrogate PK.
Column Pool
dim_customers (SCD)
Step 4: Bridge Table for Multi-Promotion Orders
A single Order can have multiple promotions, and a single Promotion can apply to multiple orders.
We’ll introduce dim_promotions and a bridge table
bridge_order_promotions
referencing fact_orders
and dim_promotions
.
Column Pool
dim_promotions
bridge_order_promotions
Step 5: Hierarchical Dimension & Additional Fact Table
We have dim_locations with a hierarchy (region → city → store). Also, we now build:
fact_orders
referencingdim_customers
,dim_products
, anddim_locations
.fact_inventory
to track store-level stock counts, referencingdim_products
anddim_locations
.
Column Pool
dim_locations
fact_orders
fact_inventory
Step 6: Data Types & Final Check
Finally, assign each column a suitable data type (e.g., INT, VARCHAR, DATE, DECIMAL). Also note any constraints if needed (e.g., NOT NULL).
Example: customer_key
→ INT, customer_id
→ VARCHAR, effective_start_date
→ DATE, quantity
→ INT, etc.