Step 1: Business Context
You work for a retail company that wants to analyze sales data for multiple stores across different regions. The system tracks:
- Customers with basic info.
- Products organized by category.
- Promotions that can apply to multiple orders.
- Locations (Region → City → Store) for all inventory and sales.
- Sales Orders (fact table) with details about each transaction.
- Inventory levels in each store (a second fact table).
We’ll walk through multiple steps to create a robust dimensional model:
- Create dimension tables with surrogate keys.
- Add a Slowly Changing Dimension (SCD) for Customers.
- Create a bridge table for multi-promotion relationships.
- Handle a hierarchical dimension for Locations.
- Add a second fact table (Inventory).
- Assign data types and constraints to each column.
Step 2: Create Dimensions with Surrogate Keys
We have two main dimensions to start: dim_customers and dim_products.
Instead of using natural IDs (customer_id, product_id) as primary keys, we
want to introduce surrogate keys (like customer_key and product_key)
to ensure stability. The natural IDs remain as attributes.
Column Pool
dim_customers
dim_products
Step 3: Slowly Changing Dimension (SCD)
Our Customer dimension must track historical changes (e.g. address changes). We’ll use a
Type 2 approach, adding columns like effective_start_date, effective_end_date,
and a current flag.
Add these columns to dim_customers, marking them as regular attributes (not PK).
The customer_key remains the surrogate PK.
Column Pool
dim_customers (SCD)
Step 4: Bridge Table for Multi-Promotion Orders
A single Order can have multiple promotions, and a single Promotion can apply to multiple orders.
We’ll introduce dim_promotions and a bridge table
bridge_order_promotions referencing fact_orders and dim_promotions.
Column Pool
dim_promotions
bridge_order_promotions
Step 5: Hierarchical Dimension & Additional Fact Table
We have dim_locations with a hierarchy (region → city → store). Also, we now build:
fact_ordersreferencingdim_customers,dim_products, anddim_locations.fact_inventoryto track store-level stock counts, referencingdim_productsanddim_locations.
Column Pool
dim_locations
fact_orders
fact_inventory
Step 6: Data Types & Final Check
Finally, assign each column a suitable data type (e.g., INT, VARCHAR, DATE, DECIMAL). Also note any constraints if needed (e.g., NOT NULL).
Example: customer_key → INT, customer_id → VARCHAR, effective_start_date → DATE, quantity → INT, etc.
