Data Modeling: Sales Data

Complex Multi-Step Data Modeling

Step 1: Business Context

You work for a retail company that wants to analyze sales data for multiple stores across different regions. The system tracks:

  • Customers with basic info.
  • Products organized by category.
  • Promotions that can apply to multiple orders.
  • Locations (Region → City → Store) for all inventory and sales.
  • Sales Orders (fact table) with details about each transaction.
  • Inventory levels in each store (a second fact table).

We’ll walk through multiple steps to create a robust dimensional model:

  1. Create dimension tables with surrogate keys.
  2. Add a Slowly Changing Dimension (SCD) for Customers.
  3. Create a bridge table for multi-promotion relationships.
  4. Handle a hierarchical dimension for Locations.
  5. Add a second fact table (Inventory).
  6. Assign data types and constraints to each column.

Step 2: Create Dimensions with Surrogate Keys

We have two main dimensions to start: dim_customers and dim_products. Instead of using natural IDs (customer_id, product_id) as primary keys, we want to introduce surrogate keys (like customer_key and product_key) to ensure stability. The natural IDs remain as attributes.

Column Pool

dim_customers

    dim_products

      Step 3: Slowly Changing Dimension (SCD)

      Our Customer dimension must track historical changes (e.g. address changes). We’ll use a Type 2 approach, adding columns like effective_start_date, effective_end_date, and a current flag.

      Add these columns to dim_customers, marking them as regular attributes (not PK). The customer_key remains the surrogate PK.

      Column Pool

      dim_customers (SCD)

        Step 4: Bridge Table for Multi-Promotion Orders

        A single Order can have multiple promotions, and a single Promotion can apply to multiple orders. We’ll introduce dim_promotions and a bridge table bridge_order_promotions referencing fact_orders and dim_promotions.

        Column Pool

        dim_promotions

          bridge_order_promotions

            Step 5: Hierarchical Dimension & Additional Fact Table

            We have dim_locations with a hierarchy (region → city → store). Also, we now build:

            • fact_orders referencing dim_customers, dim_products, and dim_locations.
            • fact_inventory to track store-level stock counts, referencing dim_products and dim_locations.

            Column Pool

            dim_locations

              fact_orders

                fact_inventory

                  Step 6: Data Types & Final Check

                  Finally, assign each column a suitable data type (e.g., INT, VARCHAR, DATE, DECIMAL). Also note any constraints if needed (e.g., NOT NULL).

                  Example: customer_key → INT, customer_id → VARCHAR, effective_start_date → DATE, quantity → INT, etc.

                  Column Pool

                  Data Types