Data Modeling
Star and snowflake schemas, slowly-changing dimensions, fact and dimension tables, and the modeling patterns analytics engineers ship in dbt.
Data modeling is the part of analytics engineering that distinguishes a senior engineer from a junior one. Anyone can write a query. Designing the layered set of tables a query *should* run against — staging, intermediate, marts, with the right grain and the right keys — is the actual craft.
This hub covers the modeling patterns analytics engineers use in dbt every day: star schemas, slowly-changing dimensions, fact and dimension tables, normalization vs denormalization, and the trade-offs that come up in real warehouse work. The articles are written for people who write SQL and want to write better SQL — not academic dimensional modeling theory.
Pair this hub with the dbt and SQL hubs, then build the capstone, where you'll design a star-schema mart layer for a marketing-attribution dataset in BigQuery.
By the end of this path you can…
- Design star schemas and know when to denormalize
- Implement Type 2 slowly-changing dimensions in dbt with snapshots
- Pick the right grain for a fact table
- Choose between surrogate and natural keys with eyes open
- Model a marketing-attribution or e-commerce dataset end-to-end
- Answer modeling questions in technical screens
From beginner to job-ready.
- 01 · Fundamentals
Tables, rows, keys, relationships — the vocabulary the rest of the curriculum is built on.
- 02 · Normalization
1NF, 2NF, 3NF in practice — and why analytics teams denormalize on purpose.
- 03 · Star + snowflake schemas
Fact and dimension tables, conformed dimensions, the grain conversation.
- 04 · Slowly-changing dimensions
Type 1, 2, 3 — and SCD2 with dbt snapshots in production.
- 05 · Keys
Surrogate vs natural; composite vs single-column; warehouse-specific patterns.
- 06 · Modeling in dbt
How the staging / intermediate / mart pattern maps to the modeling concepts above.
Read the playbook.
- Data Modeling
Data Modeling Basics: Star Schema vs. Snowflake Schema Explained
Learn the differences between star and snowflake schemas in data modeling. Discover how each affects query performance, storage, and maintenance.
- Data Modeling
Explaining Fact and Dimension Tables for Beginners: Essential Concepts in Data Warehousing
Explore the roles of fact and dimension tables in data warehousing. Learn how they work together to enhance data analysis and business insights.
- Data Modeling
Slowly Changing Dimensions Type 2 Explained: Complete Guide
Explore Type 2 Slowly Changing Dimensions to maintain historical data records. Learn how to structure tables and implement tracking methods effectively.
- Data Modeling
Top Data Modeling Best Practices for Efficient Analytics Engineering
Explore essential data modeling best practices to build scalable and efficient models, ensuring data accuracy and performance in analytics engineering.
- Data Modeling
Understanding Database Normalization and Denormalization: Concepts, Forms, and Applications
Discover how database normalization and denormalization impact data storage and retrieval. Learn when to use each approach for optimal system performance.
- Fundamentals
Surrogate vs Natural Keys: Choosing the Right Primary Key for Databases
Learn the differences between surrogate and natural keys in databases. This guide covers their benefits, drawbacks, and when to use each type effectively.
Show, don't just claim.
- intermediate · open →
Sports Equipment Pro Shop
E-commerce orders, inventory, and revenue modeling project
- intermediate · open →
Data Forge: The Lost Metrics
Metric-layer recovery and analytics debugging project (dbt + BigQuery)
- intermediate · open →
Champion Fantasy League
Event and performance analytics modeling project
Data Modeling and Architecture
12 lessons in this module
Common questions about this topic.
Star schema or snowflake schema?
Star, almost always, for analytics workloads. Snowflaking adds joins; modern columnar warehouses (BigQuery, Snowflake, Databricks) handle wide denormalized tables fine. The exception is huge dimension tables where storage matters more than query speed.
When do I need SCD2?
Whenever a dimension's history affects analysis — customer subscription tier, employee department, product price tiers. If reporting only ever cares about 'as of now', SCD1 (overwrite) is fine. dbt's snapshot block makes SCD2 the easiest pattern to implement.
What's the right grain for a fact table?
The most atomic event the business actually wants to analyze. Orders → one row per order_item, not one row per order, because line-item analysis comes up. Web events → one row per event. Aggregate later; never lose grain.
Surrogate or natural keys?
Surrogate keys for warehouse-side joins (cheap to generate with dbt_utils.generate_surrogate_key). Natural keys for source-system idempotency. The article on the topic walks through both with examples.
Start practicing this topic.
Graded exercises with hints, worked solutions, and a GPT tutor. Free to start, no credit card.
