A tool for raw data extraction only
A database backup and restore utility
A framework for transforming data in warehouses using SQL-based models
A front-end data visualization library
SQL
Java
Python only, with no SQL support
R
dbt run
dbt test
dbt seed
dbt compile
Procedural transformations within stored procedures
Declarative transformations defined as models in SQL files
Command-line transformations using shell scripts
ETL pipelines managed entirely by Python code
A data storage service
A BI dashboard tool
A build system for analytics code
A streaming data ingestion framework
A data lake storage layer
A real-time data streaming tool
A transformation layer on top of a data warehouse
A machine learning model training platform
In the data warehouse where the models are materialized
In memory on the local machine only
As MapReduce jobs on a Hadoop cluster
As REST API calls to an external service
Extract and load only
Write SQL, get reliable and tested tables
Manage complex ETL graphs visually
Replace all data engineers with fully automated pipelines
Random folder structure with no convention
Single SQL file holding all transformations
Inline code blocks in a YAML configuration file
Directory structure reflecting staging, intermediate, and mart layers
FTP servers for distributing SQL files
GUIs without version control
Git repositories for code management
Email attachments for code updates
dbt debug
In each SQL model file
In a profiles.yml file separate from the project
In the data warehouse directly
In packages.yml
Run tests on all models
Generate docs for the project
Deploy packages to production
Validate configuration and connection to the target warehouse
Hard-code credentials into each SQL model
Use a command-line argument without profiles
Use different targets in profiles.yml
Only run dbt in one static environment at a time
Run dbt debug to diagnose connection issues
Rename all model files randomly
Reinstall the warehouse entirely
Disable SSL encryption
The warehouse GUI
A Jupyter notebook
A non-SQL script runner tool
The command line interface or dbt Cloud
Database credentials and secrets
The project’s name, version, paths, and configuration
The final BI dashboard layout
Only test configurations, not models
No facility for partial runs
Selection syntax with --select flags
A GUI to click models
Environment variables in SQL statements
Use dbt run --select path.to.folder.*
Edit profiles.yml to whitelist that folder
Move models temporarily to a different project
Apply a Jinja filter inside each model
The data warehouse schema
The logs directory after executing commands
Inline comments in each SQL model
A random temporary folder
Connect to external APIs
Import Python libraries
Insert test data inline
Reference another dbt model by name
Must be defined in Python classes
Are only for testing models
Represent external tables loaded outside of dbt
Can only be used in incremental models
Large fact tables stored in a separate schema
CSV files managed by dbt and loaded as tables
Python dictionaries of configuration
Binary files that store credentials
How dbt models are built in the warehouse
The color scheme of the documentation site
The order of command-line arguments
The config to run only locally
Creates a table with indexes
Creates ephemeral data in memory only
Always runs tests automatically
Creates a view in the warehouse
Create permanent tables in the warehouse
Are never actually materialized as objects, they inline their SQL
Run Python code inside SQL
Require manual refreshes
Completely replacing large tables every run
Testing only, not production loads
Updating large datasets efficiently by appending new data
Viewing data as CSV only
Accesses variables defined in dbt_project.yml or command line
Creates a new schema
Deletes old tables
Loads Python modules
Contain final business logic
Are the last step before presenting data
Are always ephemeral
Standardize and clean source data before intermediate layers
Is a backup of raw data files
Logs user access patterns
Tracks changes in a table over time for historical versions
Only applies to Snowflake, not other warehouses
Python test files only
YAML files specifying conditions on columns
Excel spreadsheets
Shell scripts
Executes tests defined in YAML and returns pass/fail results
Creates production dashboards
Recompiles Jinja macros only
Deletes old logs
dbt deps
dbt docs generate
Runs models in production
Updates profiles
Launches a local web server to view documentation
Encrypts all test results
Code formatting only
Data constraints like uniqueness or non-null values
Network connectivity
Warehouse CPU usage
Modifying dbt_project.yml directly
Writing only Python code in macros
Using only built-in tests, no customization allowed
Creating new YAML test files or custom SQL tests
A static website of lineage and documentation
A CSV file of queries
A binary log of run results
Jinja macros with comments
Selects only source files
Runs all models and docs
Filters tests to only run singular tests
Creates ephemeral snapshots
The SQL files themselves as comments
YAML files associated with those models
Jinja templates in macros
The warehouse system tables
Validating assumptions about the data
Automatically generating BI dashboards
Encrypting all table data
Running ETL pipelines outside the warehouse
Creating GUIs for the models
Storing credentials securely
Writing tests in Python
Templating SQL queries and logic in models
A reusable Jinja function defined in .sql or .jinja files
A compiled binary file
A YAML-only configuration key
A Git submodule
execute_macro() function
ref_macro() function
{{ macro_name() }} syntax
A shell command
Store large data files
DRY up SQL logic and reuse code
Restrict user permissions
Run BI dashboards faster
In tests/ folder
In models/ folder
In the data warehouse
In macros/ directory within the project
<% %>
{{ }} for expressions and {% %} for statements
<< >>
(( ))
Jinja if-else statements in the macro definition
Only SQL CASE statements
External Python scripts
Warehouse stored procedures
Only log messages to the console
Generate dynamic SQL statements based on configs
Encrypt code files at rest
Overwrite profiles at runtime
Use dbt run with --macro-debug flag
Print Jinja variables to console with {{ log(...) }} (Incorrect syntax example)
Use {{ log("message", info=True) }} to print in logs
Pause execution in the warehouse UI
Hard-coding them in each project
Copying and pasting the SQL files manually
Storing them in the warehouse
Packaging them in a dbt package and referencing in packages.yml
dbt_package.yml
packages.yml
profiles.yml
docker-compose.yml
dbt docs
Only private packages
Data warehouse configurations
BI dashboards
Community-contributed dbt packages
Store personal credentials
Reuse code, macros, and models across projects
Delete logs after runs
Execute non-SQL transformations
Use a semantic version in packages.yml
Hard-code version in dbt_project.yml
Define it in the SQL models
It's not possible to version packages
A local file system path only
Docker image references
A Git URL with a specified branch or tag
S3 bucket paths
Run dbt run directly
Restart the data warehouse
Manually compile macros
Run dbt deps to update dependencies
Specifying versions and letting dbt resolve dependencies
Creating multiple packages.yml files
Editing the compiled SQL manually
Using only one package at a time
Only binary executables
Models, macros, tests, and seeds shared by the community
BI dashboards in JSON format
Only Python scripts
Slow down development intentionally
Add unnecessary complexity only
Accelerate development by leveraging pre-built logic
Break the warehouse connection
Creating multiple inconsistent versions of core data
Providing a single, trusted version of key business entities across the enterprise
Managing only transactional records
Storing historical logs of ETL processes
The CPU architecture of the database host
The origins, transformations, and movements of data throughout its lifecycle
The encryption keys used by the database
The downtime schedule of servers
Data about data (e.g., definitions, data types, rules)
Application user logs
Binary files containing source code
Only relevant to physical disk storage
Automatically deleting old data
Providing a searchable inventory of available data assets and their metadata
Limiting user access to a single table
Encrypting the entire database schema
Monitoring and ensuring data quality, consistency, and compliance
Developing the front-end applications
Managing only database upgrades
Scheduling hardware maintenance
Only numeric data is stored
Data meets defined standards of completeness, accuracy, and consistency
All data is in XML format
Data is always encrypted
Constantly changing transactional data
Managing sets of stable, standard values (like country codes) used across systems
Backing up database indexes only
Archiving old log files
They store data physically
They define common business terminology, ensuring everyone uses consistent definitions
They enforce foreign key constraints
They represent encrypted backup keys
Restrict data usage entirely
Establish policies, standards, and processes to ensure effective data management
Provide free access to all data for everyone
Eliminate data modeling altogether
Having too few data sources
Reconciling inconsistent representations of the same master data from multiple systems
Lack of storage space
Inability to encrypt data
Ignoring the concept of time entirely
Managing data that changes over time and capturing historical states
Only future predictions of data values
Removing historical data from the database
Entities and static attributes only
Storing and analyzing sequences of events or actions over time
Converting relational data to key-value pairs
Eliminating the concept of time in data storage
Aligning the data model closely with the domain’s ubiquitous language and concepts
Using only surrogate keys
Storing all data in one large table for simplicity
Ignoring business requirements
Ensuring all data is normalized into 5NF
Structuring data to facilitate feature extraction and training of models
Eliminating numeric attributes
Removing all categorical attributes
Describe meaning, relationships, and classification of data concepts
Are only used in relational databases
Replace all ERDs immediately
Cannot represent hierarchical information
Centralizing all data modeling in one department
Decentralizing data ownership to domain teams and treating data as a product
Eliminating domain-specific models
Using only graph databases for all storage
Batch updates only
Real-time ingestion, storage, and processing of continuously generated data
Storing everything in CSV files offline
Using only hierarchical data structures
Only relational and hierarchical models
Multiple data models (relational, document, graph) within the same platform
Only key-value and network models
Models that cannot be queried
Physically moving all data into one database
Providing a unified, virtual view of data from multiple sources without physically integrating them
Compressing tables into binary form
Only supporting structured data
Hard-coded data definitions only
The metadata repository to dynamically generate and maintain data structures
Manual schema changes only
Non-versioned definitions of data
Encrypt node data
Classify nodes or edges into categories
Delete unwanted relationships
Perform indexing only
Represent and link data using subject-predicate-object triples on the Semantic Web
Store only numeric metrics
Replace SQL queries with MapReduce jobs
Ensure relational integrity in star schemas
Strict normalization to 3NF
A schema-on-read approach, allowing flexible ingestion of various data formats
Only CSV-formatted input
Immediate indexing of all attributes
Ignoring data quality issues
Cleaning, transforming, and structuring raw data to fit the model’s requirements
Storing data in raw binary without schema
Eliminating the need for ETL processes
Rigidity and no changes after initial design
Iterative, flexible development of models to adapt to changing requirements
Using a single type of database for all solutions
Avoiding stakeholder input
Designing schemas tied to specific on-prem hardware
Considering scalability, distribution, and eventual consistency
Ignoring data redundancy completely
Only using hierarchical models
Storing only textual descriptions of locations
Handling spatial attributes (coordinates, shapes) and enabling spatial queries
Preventing any location-based queries
Using only key-value stores for addresses
Separate transactional and analytical data strictly
Support both real-time transactional and analytical workloads on the same data
Only perform analytics once a day
Disallow any updates to fact tables
Stop all writes
Enforce structure and constraints on document data
Convert documents to CSV format
Encrypt all documents by default
Promoting a single large monolithic database
Encouraging domain-specific, decoupled data stores for each service
Eliminating the concept of domain boundaries
Forcing strict relational schemas
Random node-edge models without meaning
Graph structures with semantic metadata and ontologies for richer context
Only hierarchical data sets
Data extracted from CPU caches
Automating identification and tagging of sensitive data
Removing the need for metadata
Ensuring only numeric data is stored
Disabling policy enforcement
Storing only the latest state of an entity
Recording a series of events that lead to the current state, enabling reconstruction of past states
Removing historical logs entirely
Only working with denormalized schemas
Handling high-velocity, time-stamped sensor data with flexible schemas
Storing only static reference data
Eliminating all time-based attributes
Using only relational databases
Representing nodes and edges as vector embeddings for machine learning tasks
Removing relationships entirely
Converting documents to tables
Storing only textual labels
Create isolated data silos
Provide a unified environment for data management and governance, regardless of location
Remove metadata repositories
Restrict flexibility of data access
Splits a table into subsets of rows across multiple servers
Splits a table into subsets of columns for performance or compliance
Merges all tables into one giant table
Eliminates indexes from the schema
Remove all data entirely
Alter or mask sensitive attributes to protect privacy while maintaining utility
Store data in plaintext for performance
Involve only primary key encryption
That any node can have identical properties
That certain node properties are unique within the graph database
That relationships have no properties
That all queries run faster
Live data is always available
Representative test data is needed without exposing sensitive real data
Data sets must remain empty for testing
Only fixed numeric values are allowed