ETL / ELT Quiz (200 Questions)

ETL/ELT Quiz (Questions 1–50)

Section 1: Basic ETL vs. ELT Concepts (1–20)

1. In ETL, when does the Transformation step occur?

2. ELT stands for:

3. A key difference between ETL and ELT is:

4. ETL is often chosen when:

5. ELT is often preferred with modern cloud warehouses because:

6. The main steps in ETL are:

7. One advantage of ELT over ETL is:

8. When choosing ETL vs. ELT, one key consideration is:

9. ETL traditionally was favored in legacy systems because:

10. One drawback of ETL is:

11. ELT leverages the target system's resources to:

12. Both ETL and ELT ultimately aim to:

13. ETL pipelines often run on a schedule to:

14. ELT is more flexible when requirements change because:

15. ELT often aligns well with modern data lake or warehouse strategies because:

16. One disadvantage of ELT is:

17. Deciding between ETL and ELT depends on:

18. ELT often pairs well with:

19. ETL pipelines might struggle with changing requirements because:

20. Ultimately, ETL and ELT are chosen based on:

21. Data extraction involves:

22. Incremental extraction means:

23. Handling schema changes in extraction involves:

24. Extracting from relational databases often uses:

25. APIs for extraction may require handling:

26. Files (CSV, JSON) extraction involves:

27. Handling authentication during extraction might involve:

28. When a source system is slow, extraction strategies might include:

29. Ensuring data integrity during extraction might involve:

30. Source system locks and concurrency issues during extraction can be mitigated by:

31. Metadata extraction during extraction phase involves:

32. Logging extraction events helps by:

33. Scheduling extractions often involves:

34. Dealing with unreliable sources may involve:

35. Handling different file formats (CSV, JSON, XML) requires:

36. Some ETL tools come with built-in source connectors to:

37. Minimizing network transfers during extraction might involve:

38. Validating extracted data ensures:

39. Metadata like source timestamps help by:

40. Ensuring a stable extraction process might mean:

41. Data transformation often includes:

42. Converting date formats and normalizing data units is part of:

43. Business rules in transformations might include:

44. SQL-based transformations in ELT approach use:

45. Handling Slowly Changing Dimensions (SCD) in transformations means:

46. Joining multiple data sources in transformation steps is common to:

47. Applying data quality checks mid-transformation ensures:

48. Debugging transformation logic often involves:

49. Using frameworks like Spark for transformation helps with:

50. Ensuring transformations are idempotent means:

51. Applying aggregations (e.g., sum, avg) during transformations helps to:

52. Handling character encoding issues (UTF-8 vs. ASCII) in transformations ensures:

53. Versioning transformation logic means:

54. Pushdown transformations refer to:

55. Idempotent transformations mean if rerun:

56. Reusability in transformations can be achieved by:

57. Handling schema evolution during transformation involves:

58. Transformation performance optimization may include:

59. Debugging transformation errors might use:

60. Once transformations are finalized:

61. Data loading involves:

62. Bulk loading can improve performance by:

63. Incremental loading (upserts) means:

64. Managing indexes during load may involve:

65. Timing loads off-peak hours can:

66. Transactions during load ensure:

67. Verifying load success might involve:

68. Partitioned loading improves performance by:

69. Using target-specific load utilities (e.g., COPY command in Redshift) can:

70. Handling load failures might involve:

71. Distinguishing between full refresh and incremental refresh loading means:

72. Post-load validations ensure:

73. Notifications on load completion or errors help by:

74. Balancing load tasks means:

75. Late-arriving data might be handled by:

76. Atomicity of loads ensures:

77. Archiving or purging old data during load cycles is done to:

78. Ensuring consistency and atomicity of loads might require:

79. In ELT scenarios, loading raw data first allows:

80. After successful load:

81. Traditional ETL tools (Informatica, Talend) often:

82. Modern ELT tools integrate with cloud warehouses like:

83. Orchestration tools (Airflow, Luigi) help by:

84. SaaS integration platforms (Fivetran, Stitch) often:

85. Using Python scripts for ETL can be advantageous for:

86. Dockerizing ETL jobs provides:

87. CI/CD pipelines for ETL code mean:

88. On-prem ETL tools vs. cloud-native solutions differ in:

89. Data virtualization tools help by:

90. Evaluating open-source vs. commercial ETL solutions involves:

91. Performance benchmarks among ETL tools help to:

92. Leveraging Spark or Flink in transformations is useful for:

93. DataOps integrates ETL/ELT with:

94. Integration with data catalogs helps ETL/ELT by:

95. Using message queues (Kafka) in extraction helps by:

96. ETL in a microservices architecture might mean:

97. Impact of orchestration tool’s scheduling features:

98. Version control of ETL scripts and configs helps with:

99. Selecting a tool based on data volume and complexity means:

100. Considering team skill sets for tool selection means:

101. Identifying bottlenecks in ETL/ELT pipelines often involves checking:

102. Parallelization strategies might include:

103. Partitioning data for parallel processing helps by:

104. Efficient file formats like Parquet or ORC improve performance by:

105. Compressing data before transfer reduces:

106. Memory management for large ETL jobs can be improved by:

107. Caching intermediate results might help if:

108. Query optimization in ELT scenarios includes:

109. Minimizing unnecessary data movement means:

110. Scheduling ETL jobs off-peak can improve performance by:

111. Monitoring runtime metrics (CPU, memory, throughput) helps:

112. Choosing incremental over full loads can improve performance by:

113. Using columnar storage in the target system helps because:

114. Monitoring runtime with trend analysis helps:

115. Retry and backoff strategies during extraction and loading help by:

116. Eliminating unnecessary transformations means:

117. Code profiling in ETL scripts helps by:

118. Reducing unnecessary data movement (ELT vs. ETL) can improve performance by:

119. Adopting streaming ETL for continuous processing improves performance for:

120. Automating performance regression tests means:

121. Implementing validation checks at extraction ensures:

122. Data cleansing involves:

123. Standardizing reference data during transformation helps:

124. Data lineage means:

125. Auditing changes and maintaining historical versions of data allows:

126. Data quality metrics (completeness, consistency) help by:

127. Error handling pipelines (quarantine bad records) means:

128. Role-based access controls in ETL/ELT governance ensure:

129. Metadata management helps by:

130. Ensuring consistency between source and target schemas prevents:

131. Self-service data quality checks mean:

132. SLA definitions for data timeliness and correctness ensure:

133. Logging quality metrics allows:

134. Governance frameworks (like DAMA) applied to ETL/ELT mean:

135. Aligning ETL/ELT practices with organizational policies ensures:

136. Continuous improvement cycles for data quality involve:

137. Ensuring completeness means:

138. Consistency checks ensure:

139. Accuracy checks might compare extracted data against:

140. Integrating with data governance tools means:

141. Encrypting data in transit ensures:

142. Using secure protocols (TLS/SSL) for data extraction from APIs prevents:

143. Masking or tokenizing PII fields in transformations ensures:

144. Applying column-level encryption or hashing can:

145. Strict access controls on ETL pipelines means:

146. Complying with regulations like GDPR may involve:

147. Auditing and logging who accessed ETL data ensures:

148. Using secrets managers for credentials instead of hardcoding prevents:

149. Minimizing data movement of sensitive records may mean:

150. Regular compliance audits and ETL process reviews ensure:

151. Using VPCs or private networking for data transfers ensures:

152. Complying with HIPAA for healthcare data might require:

153. Separation of duties in ETL/ELT management means:

154. Using temporary credentials or tokens instead of static keys enhances security by:

155. Applying principle of least privilege to ETL system accounts means:

156. Sanitizing logs and debug info means:

157. Ensuring data disposal and retention policies are followed means:

158. Using anonymized or synthetic test data prevents:

159. Regular compliance audits might involve:

160. Storing sensitive data only in encrypted form reduces risk if:

161. Real-time ETL differs from batch ETL by:

162. Change Data Capture (CDC) techniques detect:

163. Tools like Debezium or Attunity assist with:

164. Using Kafka or Kinesis in streaming ETL means:

165. Micro-batching vs. true streaming differs by:

166. Handling out-of-order events in streaming ETL often requires:

167. Ensuring exactly-once delivery in streaming ETL means:

168. State management in streaming transformations involves:

169. Windowing functions (tumbling, sliding windows) in streaming help by:

170. Dealing with backpressure and rate limiting ensures:

171. Transforming on the fly vs. storing raw events first is a choice between:

172. Choosing storage sinks for streaming outputs (NoSQL, data lakes) depends on:

173. Monitoring streaming pipelines for lag and throughput helps by:

174. Recovery and fault tolerance in streaming ETL might use:

175. Schema evolution in a streaming environment means:

176. Watermarks in event-time processing help by:

177. Integration with stream processing frameworks (Flink, Spark Structured Streaming) means:

178. Real-time alerting on data quality in streams allows:

179. Balancing latency vs. completeness means:

180. Continuous integration and deployment for streaming pipelines means:

181. Unit tests for individual transformation logic mean:

182. Integration tests across the entire ETL pipeline ensure:

183. Using mock sources and targets in tests allows:

184. Regression testing ensures:

185. Load testing ETL pipelines helps by:

186. Synthetic test data generation helps test:

187. Integrating tests with CI/CD means:

188. Automated error detection and retries in pipelines means:

189. Monitoring tools (e.g., Airflow UI) provide:

190. Alerting on job failures or SLA breaches means:

191. Logging best practices (structured logs) aid in:

192. Version control of ETL pipeline definitions means:

193. Auditing job runs (who triggered them, when) provides:

194. Scalability tests ensure:

195. Using checksums and row counts for validation in tests means:

196. Canary deployments for ETL changes mean:

197. Performance metrics (CPU, memory) in monitoring help identify:

198. Trend analysis in monitoring dashboards means:

199. Security scans on ETL code (linters, secret detection) ensure:

200. Continuous improvement loops from monitoring data means:

...
Ask Tutor
Tutor Chat