Interactive Lesson: Great Expectations

Great Expectations – Data Quality Testing for Analytics Engineers

Great Expectations: Data Quality Testing

Build confidence in your data through systematic validation

Step 1 of 5

Welcome to Great Expectations!

Great Expectations is a powerful framework for validating, documenting, and monitoring data quality. In this interactive exercise, you’ll learn how to:

  • Define expectations about your data
  • Validate data against those expectations
  • Understand different types of data quality checks
  • Build comprehensive test suites
📊 Sample Data: Online Marketplace
📌 Scenario: You’re the analytics engineer for an online marketplace. Your team needs to ensure data quality before it reaches the warehouse. Select different datasets to test various data quality issues.
product_id name category price stock_quantity rating
PRD001 Wireless Headphones Electronics 79.99 150 4.5
PRD002 Yoga Mat Sports 29.99 200 4.8
PRD003 NULL Books 15.99 75 4.2
PRD004 Coffee Maker Home & Kitchen -49.99 50 3.9
PRD005 Running Shoes Sports 89.99 0 4.6
PRD006 Laptop Stand Electronics 34.99 120 6.5
⚠️ Data Issues Detected: Can you spot the problems? Use Great Expectations to systematically catch these issues!
🎯 Build Your Expectation Suite
📋 Table-Level Expectations
0
expect_table_row_count_to_be_between
Ensure table has expected number of rows
expect_table_columns_to_match_set
Verify all required columns are present
🔤 Column-Level Expectations
0
expect_column_values_to_not_be_null
Check for missing values in critical columns
expect_column_values_to_be_unique
Ensure ID columns have unique values
expect_column_values_to_be_between
Validate numeric values are within range
expect_column_values_to_be_in_set
Check categorical values are valid
Data Quality Expectations
0
expect_column_values_to_match_regex
Validate ID format (e.g., PRD###)
expect_column_value_lengths_to_be_between
Check string length constraints
expect_column_max_to_be_between
Ensure data freshness (latest timestamp)

📚 Key Concepts

🎯 Expectations

Assertions about your data that define what “valid” means for your use case.

📋 Expectation Suite

A collection of expectations that together define quality for a dataset.

✅ Validation

The process of checking data against expectations to find quality issues.

📄 Data Docs

Human-readable documentation of expectations and validation results.

💡 Best Practices

  • Start with critical columns (IDs, amounts, dates)
  • Add expectations incrementally as you learn about the data
  • Document why each expectation exists
  • Set up automated validation in your data pipeline
  • Review and update expectations as business rules change