Interactive Lesson: Great Expectations
✓
Great Expectations: Data Quality Testing
Build confidence in your data through systematic validation
📊
Sample Data: Online Marketplace
📌 Scenario: You’re the analytics engineer for an online marketplace. Your team needs to ensure data quality before it reaches the warehouse. Select different datasets to test various data quality issues.
product_id | name | category | price | stock_quantity | rating |
---|---|---|---|---|---|
PRD001 | Wireless Headphones | Electronics | 79.99 | 150 | 4.5 |
PRD002 | Yoga Mat | Sports | 29.99 | 200 | 4.8 |
PRD003 | NULL | Books | 15.99 | 75 | 4.2 |
PRD004 | Coffee Maker | Home & Kitchen | -49.99 | 50 | 3.9 |
PRD005 | Running Shoes | Sports | 89.99 | 0 | 4.6 |
PRD006 | Laptop Stand | Electronics | 34.99 | 120 | 6.5 |
⚠️ Data Issues Detected: Can you spot the problems? Use Great Expectations to systematically catch these issues!
🎯
Build Your Expectation Suite
📋
Table-Level Expectations
0
✓
expect_table_row_count_to_be_between
Ensure table has expected number of rows
✓
expect_table_columns_to_match_set
Verify all required columns are present
🔤
Column-Level Expectations
0
✓
expect_column_values_to_not_be_null
Check for missing values in critical columns
✓
expect_column_values_to_be_unique
Ensure ID columns have unique values
✓
expect_column_values_to_be_between
Validate numeric values are within range
✓
expect_column_values_to_be_in_set
Check categorical values are valid
✨
Data Quality Expectations
0
✓
expect_column_values_to_match_regex
Validate ID format (e.g., PRD###)
✓
expect_column_value_lengths_to_be_between
Check string length constraints
✓
expect_column_max_to_be_between
Ensure data freshness (latest timestamp)
📚 Key Concepts
🎯 Expectations
Assertions about your data that define what “valid” means for your use case.
📋 Expectation Suite
A collection of expectations that together define quality for a dataset.
✅ Validation
The process of checking data against expectations to find quality issues.
📄 Data Docs
Human-readable documentation of expectations and validation results.
💡 Best Practices
- Start with critical columns (IDs, amounts, dates)
- Add expectations incrementally as you learn about the data
- Document why each expectation exists
- Set up automated validation in your data pipeline
- Review and update expectations as business rules change