dbt Quiz (100 Questions) – Analytics Engineering

dbt Quiz (Questions 1–20)

Introduction to dbt (1–10)

1. What is dbt primarily used for?

  A tool for raw data extraction only

  A database backup and restore utility

  A framework for transforming data in warehouses using SQL-based models

  A front-end data visualization library

2. dbt projects are commonly written in which language?

SQL

  Java

  Python only, with no SQL support

3. Which dbt command compiles models without running them?

  dbt run

  dbt test

  dbt seed

  dbt compile

4. Which concept best describes dbt’s approach to data transformations?

  Procedural transformations within stored procedures

  Declarative transformations defined as models in SQL files

  Command-line transformations using shell scripts

  ETL pipelines managed entirely by Python code

5. dbt is most similar in concept to:

  A data storage service

  A BI dashboard tool

  A build system for analytics code

  A streaming data ingestion framework

6. dbt fits into the modern data stack as:

  A data lake storage layer

  A real-time data streaming tool

  A transformation layer on top of a data warehouse

  A machine learning model training platform

7. When running dbt, transformations are executed:

  In the data warehouse where the models are materialized

  In memory on the local machine only

  As MapReduce jobs on a Hadoop cluster

  As REST API calls to an external service

8. dbt’s philosophy is often summarized as:

  Extract and load only

  Write SQL, get reliable and tested tables

  Manage complex ETL graphs visually

  Replace all data engineers with fully automated pipelines

9. dbt models are typically organized in a:

  Random folder structure with no convention

  Single SQL file holding all transformations

  Inline code blocks in a YAML configuration file

  Directory structure reflecting staging, intermediate, and mart layers

10. dbt encourages a version-control workflow that primarily uses:

  FTP servers for distributing SQL files

  GUIs without version control

  Git repositories for code management

  Email attachments for code updates

11. Which command runs all models in a dbt project?

  dbt run

  dbt seed

  dbt test

  dbt debug

12. Where are target profiles typically defined?

  In each SQL model file

  In a profiles.yml file separate from the project

  In the data warehouse directly

  In packages.yml

13. The dbt debug command is used to:

  Run tests on all models

  Generate docs for the project

  Deploy packages to production

  Validate configuration and connection to the target warehouse

14. To specify different environments like dev or prod, you can:

  Hard-code credentials into each SQL model

  Use a command-line argument without profiles

  Use different targets in profiles.yml

  Only run dbt in one static environment at a time

15. If your dbt command isn’t connecting to the warehouse, the first step is often to:

  Run dbt debug to diagnose connection issues

  Rename all model files randomly

  Reinstall the warehouse entirely

  Disable SSL encryption

16. dbt commands typically run from:

  The warehouse GUI

  A Jupyter notebook

  A non-SQL script runner tool

  The command line interface or dbt Cloud

17. A dbt_project.yml file defines:

  Database credentials and secrets

  The project’s name, version, paths, and configuration

  The final BI dashboard layout

  Only test configurations, not models

18. To run a subset of models, dbt uses:

  No facility for partial runs

  Selection syntax with --select flags

  A GUI to click models

  Environment variables in SQL statements

19. To run models for a specific folder only:

  Use dbt run --select path.to.folder.*

  Edit profiles.yml to whitelist that folder

  Move models temporarily to a different project

  Apply a Jinja filter inside each model

20. dbt logs by default are found in:

  The data warehouse schema

  The logs directory after executing commands

  Inline comments in each SQL model

  A random temporary folder

21. A ref() function in dbt is used to:

  Connect to external APIs

  Import Python libraries

  Insert test data inline

  Reference another dbt model by name

22. Sources in dbt:

  Must be defined in Python classes

  Are only for testing models

  Represent external tables loaded outside of dbt

  Can only be used in incremental models

23. Seeds in dbt are:

  Large fact tables stored in a separate schema

  CSV files managed by dbt and loaded as tables

  Python dictionaries of configuration

  Binary files that store credentials

24. Materializations define:

  How dbt models are built in the warehouse

  The color scheme of the documentation site

  The order of command-line arguments

  The config to run only locally

25. A model configured as 'view' materialization:

  Creates a table with indexes

  Creates ephemeral data in memory only

  Always runs tests automatically

  Creates a view in the warehouse

26. Ephemeral models:

  Create permanent tables in the warehouse

  Are never actually materialized as objects, they inline their SQL

  Run Python code inside SQL

  Require manual refreshes

27. Incremental models are beneficial for:

  Completely replacing large tables every run

  Testing only, not production loads

  Updating large datasets efficiently by appending new data

  Viewing data as CSV only

28. The var() function in dbt:

  Accesses variables defined in dbt_project.yml or command line

  Creates a new schema

  Deletes old tables

  Loads Python modules

29. Staging models typically:

  Contain final business logic

  Are the last step before presenting data

  Are always ephemeral

  Standardize and clean source data before intermediate layers

30. A snapshot in dbt:

  Is a backup of raw data files

  Logs user access patterns

  Tracks changes in a table over time for historical versions

  Only applies to Snowflake, not other warehouses

31. dbt tests are written in:

  Python test files only

  YAML files specifying conditions on columns

  Excel spreadsheets

  Shell scripts

32. Running dbt test does what?

  Executes tests defined in YAML and returns pass/fail results

  Creates production dashboards

  Recompiles Jinja macros only

  Deletes old logs

33. Documentation in dbt is generated by:

  dbt seed

  dbt run

  dbt deps

  dbt docs generate

34. The dbt docs serve command:

  Runs models in production

  Updates profiles

  Launches a local web server to view documentation

  Encrypts all test results

35. Schema tests check:

  Code formatting only

  Data constraints like uniqueness or non-null values

  Network connectivity

  Warehouse CPU usage

36. Custom tests are defined by:

  Modifying dbt_project.yml directly

  Writing only Python code in macros

  Using only built-in tests, no customization allowed

  Creating new YAML test files or custom SQL tests

37. dbt docs generate outputs:

  A static website of lineage and documentation

  A CSV file of queries

  A binary log of run results

  Jinja macros with comments

38. The --select test_type:singular argument:

  Selects only source files

  Runs all models and docs

  Filters tests to only run singular tests

  Creates ephemeral snapshots

39. Descriptions for models and columns are usually stored in:

  The SQL files themselves as comments

  YAML files associated with those models

  Jinja templates in macros

  The warehouse system tables

40. Tests can help ensure data quality by:

  Validating assumptions about the data

  Automatically generating BI dashboards

  Encrypting all table data

  Running ETL pipelines outside the warehouse

41. Jinja in dbt is used for:

  Creating GUIs for the models

  Storing credentials securely

  Writing tests in Python

  Templating SQL queries and logic in models

42. A macro in dbt is:

  A reusable Jinja function defined in .sql or .jinja files

  A compiled binary file

  A YAML-only configuration key

  A Git submodule

43. To reference a macro, you typically use:

  execute_macro() function

  ref_macro() function

  {{ macro_name() }} syntax

  A shell command

44. Macros help:

  Store large data files

  DRY up SQL logic and reuse code

  Restrict user permissions

  Run BI dashboards faster

45. A common macro file location is:

  In tests/ folder

  In models/ folder

  In the data warehouse

  In macros/ directory within the project

46. Jinja variables are enclosed in:

  <% %>

  {{ }} for expressions and {% %} for statements

  << >>

  (( ))

47. If you want a macro to do conditional logic, you’d use:

  Jinja if-else statements in the macro definition

  Only SQL CASE statements

  External Python scripts

  Warehouse stored procedures

48. Macros can also be used to:

  Only log messages to the console

  Generate dynamic SQL statements based on configs

  Encrypt code files at rest

  Overwrite profiles at runtime

49. To debug a macro, you can:

  Use dbt run with --macro-debug flag

  Print Jinja variables to console with {{ log(...) }} (Incorrect syntax example)

  Use {{ log("message", info=True) }} to print in logs

  Pause execution in the warehouse UI

50. Macros can be made available to other dbt projects by:

  Hard-coding them in each project

  Copying and pasting the SQL files manually

  Storing them in the warehouse

  Packaging them in a dbt package and referencing in packages.yml

51. dbt packages are defined in:

  dbt_package.yml

  packages.yml

  profiles.yml

  docker-compose.yml

52. To install packages, you run:

  dbt deps

  dbt docs

  dbt seed

  dbt test

53. The dbt Package Hub hosts:

  Only private packages

  Data warehouse configurations

  BI dashboards

  Community-contributed dbt packages

54. Packages help:

  Store personal credentials

  Reuse code, macros, and models across projects

  Delete logs after runs

  Execute non-SQL transformations

55. To specify a package version:

  Use a semantic version in packages.yml

  Hard-code version in dbt_project.yml

  Define it in the SQL models

  It's not possible to version packages

56. A private Git repository can be referenced in packages.yml using:

  A local file system path only

  Docker image references

  A Git URL with a specified branch or tag

  S3 bucket paths

57. After updating packages.yml, you must:

  Run dbt run directly

  Restart the data warehouse

  Manually compile macros

  Run dbt deps to update dependencies

58. Package conflicts are resolved by:

  Specifying versions and letting dbt resolve dependencies

  Creating multiple packages.yml files

  Editing the compiled SQL manually

  Using only one package at a time

59. Packages often contain:

  Only binary executables

  Models, macros, tests, and seeds shared by the community

  BI dashboards in JSON format

  Only Python scripts

60. Using packages can:

  Slow down development intentionally

  Add unnecessary complexity only

  Accelerate development by leveraging pre-built logic

  Break the warehouse connection

61. Master Data Management (MDM) focuses on:

  Creating multiple inconsistent versions of core data

  Providing a single, trusted version of key business entities across the enterprise

  Managing only transactional records

  Storing historical logs of ETL processes

62. Data lineage describes:

  The CPU architecture of the database host

  The origins, transformations, and movements of data throughout its lifecycle

  The encryption keys used by the database

  The downtime schedule of servers

63. Metadata in data modeling is:

  Data about data (e.g., definitions, data types, rules)

  Application user logs

  Binary files containing source code

  Only relevant to physical disk storage

64. A data catalog helps users by:

  Automatically deleting old data

  Providing a searchable inventory of available data assets and their metadata

  Limiting user access to a single table

  Encrypting the entire database schema

65. Data stewardship is responsible for:

  Monitoring and ensuring data quality, consistency, and compliance

  Developing the front-end applications

  Managing only database upgrades

  Scheduling hardware maintenance

66. Data quality rules ensure:

  Only numeric data is stored

  Data meets defined standards of completeness, accuracy, and consistency

  All data is in XML format

  Data is always encrypted

67. Reference data management involves:

  Constantly changing transactional data

  Managing sets of stable, standard values (like country codes) used across systems

  Backing up database indexes only

  Archiving old log files

68. Business Glossaries are important because:

  They store data physically

  They define common business terminology, ensuring everyone uses consistent definitions

  They enforce foreign key constraints

  They represent encrypted backup keys

69. Data governance programs typically aim to:

  Restrict data usage entirely

  Establish policies, standards, and processes to ensure effective data management

  Provide free access to all data for everyone

  Eliminate data modeling altogether

70. A common challenge in MDM is:

  Having too few data sources

  Reconciling inconsistent representations of the same master data from multiple systems

  Lack of storage space

  Inability to encrypt data

71. Temporal data modeling deals with:

  Ignoring the concept of time entirely

  Managing data that changes over time and capturing historical states

  Only future predictions of data values

  Removing historical data from the database

72. Event modeling focuses on:

  Entities and static attributes only

  Storing and analyzing sequences of events or actions over time

  Converting relational data to key-value pairs

  Eliminating the concept of time in data storage

73. Domain-driven design (DDD) in data modeling emphasizes:

  Aligning the data model closely with the domain’s ubiquitous language and concepts

  Using only surrogate keys

  Storing all data in one large table for simplicity

  Ignoring business requirements

74. Data modeling for machine learning often involves:

  Ensuring all data is normalized into 5NF

  Structuring data to facilitate feature extraction and training of models

  Eliminating numeric attributes

  Removing all categorical attributes

75. Ontologies and taxonomies in semantic data modeling:

  Describe meaning, relationships, and classification of data concepts

  Are only used in relational databases

  Replace all ERDs immediately

  Cannot represent hierarchical information

76. Data mesh architecture suggests:

  Centralizing all data modeling in one department

  Decentralizing data ownership to domain teams and treating data as a product

  Eliminating domain-specific models

  Using only graph databases for all storage

77. Streaming data modeling focuses on:

  Batch updates only

  Real-time ingestion, storage, and processing of continuously generated data

  Storing everything in CSV files offline

  Using only hierarchical data structures

78. Multi-model databases support:

  Only relational and hierarchical models

  Multiple data models (relational, document, graph) within the same platform

  Only key-value and network models

  Models that cannot be queried

79. Data virtualization is about:

  Physically moving all data into one database

  Providing a unified, virtual view of data from multiple sources without physically integrating them

  Compressing tables into binary form

  Only supporting structured data

80. Metadata-driven modeling leverages:

  Hard-coded data definitions only

  The metadata repository to dynamically generate and maintain data structures

  Manual schema changes only

  Non-versioned definitions of data

81. In graph modeling, u201clabelsu201d are used to:

  Encrypt node data

  Classify nodes or edges into categories

  Delete unwanted relationships

  Perform indexing only

82. RDF (Resource Description Framework) is used to:

  Represent and link data using subject-predicate-object triples on the Semantic Web

  Store only numeric metrics

  Replace SQL queries with MapReduce jobs

  Ensure relational integrity in star schemas

83. Modeling data for data lakes often requires:

  Strict normalization to 3NF

  A schema-on-read approach, allowing flexible ingestion of various data formats

  Only CSV-formatted input

  Immediate indexing of all attributes

84. Data wrangling (data preparation) techniques help by:

  Ignoring data quality issues

  Cleaning, transforming, and structuring raw data to fit the model’s requirements

  Storing data in raw binary without schema

  Eliminating the need for ETL processes

85. Agile data modeling promotes:

  Rigidity and no changes after initial design

  Iterative, flexible development of models to adapt to changing requirements

  Using a single type of database for all solutions

  Avoiding stakeholder input

86. Data modeling for cloud-native systems often involves:

  Designing schemas tied to specific on-prem hardware

  Considering scalability, distribution, and eventual consistency

  Ignoring data redundancy completely

  Only using hierarchical models

87. Geospatial data modeling involves:

  Storing only textual descriptions of locations

  Handling spatial attributes (coordinates, shapes) and enabling spatial queries

  Preventing any location-based queries

  Using only key-value stores for addresses

88. Hybrid transaction/analytical processing (HTAP) systems require models that:

  Separate transactional and analytical data strictly

  Support both real-time transactional and analytical workloads on the same data

  Only perform analytics once a day

  Disallow any updates to fact tables

89. JSON schema validation in document databases is used to:

  Stop all writes

  Enforce structure and constraints on document data

  Convert documents to CSV format

  Encrypt all documents by default

90. Microservices architectures influence data modeling by:

  Promoting a single large monolithic database

  Encouraging domain-specific, decoupled data stores for each service

  Eliminating the concept of domain boundaries

  Forcing strict relational schemas

91. Knowledge graphs combine:

  Random node-edge models without meaning

  Graph structures with semantic metadata and ontologies for richer context

  Only hierarchical data sets

  Data extracted from CPU caches

92. Data modeling for AI-driven data governance tools helps by:

  Automating identification and tagging of sensitive data

  Removing the need for metadata

  Ensuring only numeric data is stored

  Disabling policy enforcement

93. Event sourcing in data modeling involves:

  Storing only the latest state of an entity

  Recording a series of events that lead to the current state, enabling reconstruction of past states

  Removing historical logs entirely

  Only working with denormalized schemas

94. Data modeling for IoT (Internet of Things) requires:

  Handling high-velocity, time-stamped sensor data with flexible schemas

  Storing only static reference data

  Eliminating all time-based attributes

  Using only relational databases

95. Using graph embeddings in data modeling refers to:

  Representing nodes and edges as vector embeddings for machine learning tasks

  Removing relationships entirely

  Converting documents to tables

  Storing only textual labels

96. Data Fabric architectures rely on modeling to:

  Create isolated data silos

  Provide a unified environment for data management and governance, regardless of location

  Remove metadata repositories

  Restrict flexibility of data access

97. Vertical partitioning in physical modeling:

  Splits a table into subsets of rows across multiple servers

  Splits a table into subsets of columns for performance or compliance

  Merges all tables into one giant table

  Eliminates indexes from the schema

98. Data anonymization techniques in data modeling:

  Remove all data entirely

  Alter or mask sensitive attributes to protect privacy while maintaining utility

  Store data in plaintext for performance

  Involve only primary key encryption

99. Applying graph constraints (like uniqueness of node labels) ensures:

  That any node can have identical properties

  That certain node properties are unique within the graph database

  That relationships have no properties

  That all queries run faster

100. Synthetic data generation in data modeling is used when:

  Live data is always available

  Representative test data is needed without exposing sensitive real data

  Data sets must remain empty for testing

  Only fixed numeric values are allowed

...

Ask Tutor

Tutor Chat