Cover....1
Copyright....4
Table of Contents....5
Preface....9
Why We Wrote This Book....10
Who This Book Is For....11
How This Book Is Organized....12
Conventions Used in This Book....13
Using Code Examples....14
O’Reilly Online Learning....15
How to Contact Us....15
Acknowledgments....16
Chapter 1. Analytics Engineering....17
Databases and Their Impact on Analytics Engineering....19
Cloud Computing and Its Impact on Analytics Engineering....21
The Data Analytics Lifecycle....24
The New Role of Analytics Engineer....27
Responsibilities of an Analytics Engineer....28
Enabling Analytics in a Data Mesh....29
Data Products....30
dbt as a Data Mesh Enabler....31
The Heart of Analytics Engineering....32
The Legacy Processes....33
Using SQL and Stored Procedures for ETL/ELT....34
Using ETL Tools....35
The dbt Revolution....36
Summary....38
Chapter 2. Data Modeling for Analytics....39
A Brief on Data Modeling....40
The Conceptual Phase of Modeling....41
The Logical Phase of Modeling....44
The Physical Phase of Modeling....46
The Data Normalization Process....47
Dimensional Data Modeling....51
Modeling with the Star Schema....52
Modeling with the Snowflake Schema....56
Modeling with Data Vault....58
Monolith Data Modeling....61
Building Modular Data Models....63
Enabling Modular Data Models with dbt....65
Testing Your Data Models....73
Generating Data Documentation....75
Debugging and Optimizing Data Models....76
Medallion Architecture Pattern....79
Summary....82
Chapter 3. SQL for Analytics....83
The Resiliency of SQL....84
Database Fundamentals....86
Types of Databases....88
Database Management System....91
“Speaking” with a Database....93
Creating and Managing Your Data Structures with DDL....94
Manipulating Data with DML....98
Inserting Data with INSERT....99
Selecting Data with SELECT....101
Updating Data with UPDATE....112
Deleting Data with DELETE....113
Storing Queries as Views....114
Common Table Expressions....117
Window Functions....121
SQL for Distributed Data Processing....125
Data Manipulation with DuckDB....129
Data Manipulation with Polars....133
Data Manipulation with FugueSQL....138
Bonus: Training Machine Learning Models with SQL....145
Summary....149
Chapter 4. Data Transformation with dbt....151
dbt Design Philosophy....152
dbt Data Flow....154
dbt Cloud....155
Setting Up dbt Cloud with BigQuery and GitHub....156
Using the dbt Cloud UI....169
Using the dbt Cloud IDE....179
Structure of a dbt Project....181
Jaffle Shop Database....184
YAML Files....184
Models....190
Sources....200
Tests....205
Analyses....213
Seeds....214
Documentation....216
dbt Commands and Selection Syntax....225
Jobs and Deployment....228
Summary....237
Chapter 5. dbt Advanced Topics....239
Model Materializations....239
Tables, Views, and Ephemeral Models....240
Incremental Models....243
Materialized Views....245
Snapshots....246
Dynamic SQL with Jinja....249
Using SQL Macros....252
dbt Packages....258
Installing Packages....258
Exploring the dbt_utils Package....260
Using Packages Inside Macros and Models....260
dbt Semantic Layer....262
Summary....266
Chapter 6. Building an End-to-End Analytics Engineering Use Case....269
Problem Definition: An Omnichannel Analytics Case....270
Operational Data Modeling....270
Conceptual Model....270
Logical Model....271
Physical Model....272
High-Level Data Architecture....276
Analytical Data Modeling....281
Identify the Business Processes....282
Identify Facts and Dimensions in the Dimensional Data Model....283
Identify the Attributes for Dimensions....285
Define the Granularity for Business Facts....286
Creating Our Data Warehouse with dbt....287
Tests, Documentation, and Deployment with dbt....296
Data Analytics with SQL....307
Conclusion....312
Index....313
About the Authors....322
Colophon....323
With the shift from data warehouses to data lakes, data now lands in repositories before it's been transformed, enabling engineers to model raw data into clean, well-defined datasets. dbt (data build tool) helps you take data further. This practical book shows data analysts, data engineers, BI developers, and data scientists how to create a true self-service transformation platform through the use of dynamic SQL.
Authors Rui Machado from Monstarlab and Hélder Russa from Jumia show you how to quickly deliver new data products by focusing more on value delivery and less on architectural and engineering aspects. If you know your business well and have the technical skills to model raw data into clean, well-defined datasets, you'll learn how to design and deliver data models without any technical influence.