Topic hub

Data Modeling

Star and snowflake schemas, slowly-changing dimensions, fact and dimension tables, and the modeling patterns analytics engineers ship in dbt.

Data modeling is the part of analytics engineering that distinguishes a senior engineer from a junior one. Anyone can write a query. Designing the layered set of tables a query *should* run against — staging, intermediate, marts, with the right grain and the right keys — is the actual craft.

This hub covers the modeling patterns analytics engineers use in dbt every day: star schemas, slowly-changing dimensions, fact and dimension tables, normalization vs denormalization, and the trade-offs that come up in real warehouse work. The articles are written for people who write SQL and want to write better SQL — not academic dimensional modeling theory.

Pair this hub with the dbt and SQL hubs, then build the capstone, where you'll design a star-schema mart layer for a marketing-attribution dataset in BigQuery.

What you'll learn

By the end of this path you can…

  • Design star schemas and know when to denormalize
  • Implement Type 2 slowly-changing dimensions in dbt with snapshots
  • Pick the right grain for a fact table
  • Choose between surrogate and natural keys with eyes open
  • Model a marketing-attribution or e-commerce dataset end-to-end
  • Answer modeling questions in technical screens
The learning path

From beginner to job-ready.

  1. 01 · Fundamentals

    Tables, rows, keys, relationships — the vocabulary the rest of the curriculum is built on.

  2. 02 · Normalization

    1NF, 2NF, 3NF in practice — and why analytics teams denormalize on purpose.

  3. 03 · Star + snowflake schemas

    Fact and dimension tables, conformed dimensions, the grain conversation.

  4. 04 · Slowly-changing dimensions

    Type 1, 2, 3 — and SCD2 with dbt snapshots in production.

  5. 05 · Keys

    Surrogate vs natural; composite vs single-column; warehouse-specific patterns.

  6. 06 · Modeling in dbt

    How the staging / intermediate / mart pattern maps to the modeling concepts above.

In the course

Data Modeling and Architecture

12 lessons in this module

Common questions

Common questions about this topic.

Star schema or snowflake schema?

Star, almost always, for analytics workloads. Snowflaking adds joins; modern columnar warehouses (BigQuery, Snowflake, Databricks) handle wide denormalized tables fine. The exception is huge dimension tables where storage matters more than query speed.

When do I need SCD2?

Whenever a dimension's history affects analysis — customer subscription tier, employee department, product price tiers. If reporting only ever cares about 'as of now', SCD1 (overwrite) is fine. dbt's snapshot block makes SCD2 the easiest pattern to implement.

What's the right grain for a fact table?

The most atomic event the business actually wants to analyze. Orders → one row per order_item, not one row per order, because line-item analysis comes up. Web events → one row per event. Aggregate later; never lose grain.

Surrogate or natural keys?

Surrogate keys for warehouse-side joins (cheap to generate with dbt_utils.generate_surrogate_key). Natural keys for source-system idempotency. The article on the topic walks through both with examples.

Start practicing this topic.

Graded exercises with hints, worked solutions, and a GPT tutor. Free to start, no credit card.