Interactive Exercise: dbt Incremental Materialization
⏰Time Machine Factory
Master dbt incremental materialization by processing data through time!
🏭Your Mission: Build an Efficient Data Pipeline
You’re the lead data engineer at Time Machine Factory, processing millions of product events daily.
Full table refreshes take 6 hours! Your mission: implement incremental materialization to process only new and changed records.
Learn: Incremental strategies, unique keys, surrogate keys, Jinja macros, and optimization techniques.
Learn: Incremental strategies, unique keys, surrogate keys, Jinja macros, and optimization techniques.
How to use this exercise
- Select a day on the timeline (use Jan 4 or Jan 5 for data).
- Choose a strategy (Append / Merge / Delete+Insert / Insert Overwrite).
- (Optional) Switch the tabs (Model / Config / Compiled) to view the code.
- Click Run Incremental.
- (Optional) Click Compare Full Refresh or Best Practices.
Already ran all days? Use Reset Exercise to start over.
📊Data Stream Timeline
Jan 1
10K
Initial Load
Jan 2
2.5K
Processed
Jan 3
3.1K
Processed
Jan 4
2.8K
Ready
Jan 5
4.2K
New Data!
Append
Add new records only
Merge
Upsert with unique key
Delete+Insert
Replace date partitions
Insert Overwrite
Partition replacement
📝dbt Model Code
— models/marts/product_events.sql
{{
config(
materialized = ‘incremental’,
incremental_strategy = ‘append’,
on_schema_change = ‘fail’
)
}}
WITH source_data AS (
SELECT
product_id,
event_type,
quantity,
price,
quantity * price AS total_value,
created_at,
CURRENT_TIMESTAMP() AS updated_at,
— Generate surrogate key using dbt_utils
{{ dbt_utils.generate_surrogate_key([
‘product_id’,
‘created_at’
]) }} AS surrogate_key
FROM {{ ref(‘staging_product_events’) }}
{% if is_incremental() %}
— Only process new records since last run
WHERE created_at > (
SELECT MAX(created_at)
FROM {{ this }}
)
{% endif %}
)
SELECT * FROM source_data
📈 Processing Results
15,600
Total Rows
0
New Rows
0
Updated
0s
Process Time
💡 Append Strategy: Only adds new records based on a timestamp.
Perfect for immutable event data. No unique_key needed, fastest performance.
product_id | event_type | quantity | price | created_at | updated_at | surrogate_key |
---|---|---|---|---|---|---|
Click “Run Incremental” to process data |