Project/Business Context

We run an e-commerce company. We want to build an analytics dashboard to track orders, returns, shipping status, and revenue trends.
Our BI and analytics teams will query this data to get daily metrics like total orders, total revenue, average order value, and daily active customers.
Data Usage

We need daily (possibly hourly) refreshes in our data warehouse.
Primary consumption is through a BI dashboard (e.g., Looker, Power BI) for both operational and executive summaries.
High-Level Requirements

Track order transactions (order date, shipping date, status, total amount).
Track customer details (new vs. returning, region, sign-up date).
Need to analyze product-level sales (e.g., top-selling SKUs, categories).
Source Data Details

Data stored initially in MySQL.
Tables:
orders_raw: includes order_id, customer_id, order_date, status, total_amount, shipping_date, coupon_code.
customers_raw: includes customer_id, customer_name, email, signup_date, region.
products_raw: includes product_id, product_name, category, list_price, active_flag.
order_items_raw: includes order_id, product_id, quantity, unit_price.
Average size:
orders_raw ~ 2 million rows per year.
order_items_raw ~ 5 million rows per year (because each order has multiple items).
Updates happen daily with a batch load from the transactional DB.
Desired Data Model Approach

Star schema is preferred for simpler analytics.
A fact table for orders or order line items.
Dimension tables for customers, products, date, etc.
SCD Type 2 for products dimension: product info can change (price, category, etc.), and we want history.
Constraints & Performance Needs

Concurrency is moderate (a dozen BI users).
We must mask or encrypt email addresses and personal data for compliance.
We plan to load all data into a Snowflake data warehouse.
We prefer snake_case naming conventions for all table/column names (e.g., order_fact, customer_dim).
ERD Requirements

Show how the dimensions (customers, products, date) connect to the facts (orders, order_items).
Include relationships (PK/FK).
Ensure we have a date dimension for analytics by day, week, month, etc.
Data Dictionary & Metadata

Provide brief column descriptions for new or changed columns.
Mark which columns are primary keys (e.g., customer_id in customer_dim).
Best Practices & Optimization

We need to confirm indexing strategies in Snowflake (or best practice clustering).
Possibly partition large fact tables by date (if it makes sense in Snowflake).
Consider whether we should denormalize or keep a separate returns_fact for clarity.
Additional Data Governance

We want basic dbt tests for data quality (unique keys, not null, etc.).
We must address PII in customers (e.g., hashing or partial masking of email).
Final Deliverables

A proposed ERD (text-based if needed) showing facts and dimensions.
A data dictionary describing each table/column.
Recommendations on indexing, partitioning/clustering in Snowflake.
Guidance on handling SCD Type 2 for the product dimension.
A list of open questions or assumptions if anything is unclear.