Analytics Engineering with SQL and dbt: Building Meaningful Data Models at Scale

Analytics Engineering with SQL and dbt: Building Meaningful Data Models at Scale

Analytics Engineering with SQL and dbt: Building Meaningful Data Models at Scale
Автор: Machado Rui, Russa Helder
Дата выхода: 2024
Издательство: O’Reilly Media, Inc.
Количество страниц: 324
Размер файла: 2.9 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Cover....1

Copyright....4

Table of Contents....5

Preface....9

Why We Wrote This Book....10

Who This Book Is For....11

How This Book Is Organized....12

Conventions Used in This Book....13

Using Code Examples....14

O’Reilly Online Learning....15

How to Contact Us....15

Acknowledgments....16

Chapter 1. Analytics Engineering....17

Databases and Their Impact on Analytics Engineering....19

Cloud Computing and Its Impact on Analytics Engineering....21

The Data Analytics Lifecycle....24

The New Role of Analytics Engineer....27

Responsibilities of an Analytics Engineer....28

Enabling Analytics in a Data Mesh....29

Data Products....30

dbt as a Data Mesh Enabler....31

The Heart of Analytics Engineering....32

The Legacy Processes....33

Using SQL and Stored Procedures for ETL/ELT....34

Using ETL Tools....35

The dbt Revolution....36

Summary....38

Chapter 2. Data Modeling for Analytics....39

A Brief on Data Modeling....40

The Conceptual Phase of Modeling....41

The Logical Phase of Modeling....44

The Physical Phase of Modeling....46

The Data Normalization Process....47

Dimensional Data Modeling....51

Modeling with the Star Schema....52

Modeling with the Snowflake Schema....56

Modeling with Data Vault....58

Monolith Data Modeling....61

Building Modular Data Models....63

Enabling Modular Data Models with dbt....65

Testing Your Data Models....73

Generating Data Documentation....75

Debugging and Optimizing Data Models....76

Medallion Architecture Pattern....79

Summary....82

Chapter 3. SQL for Analytics....83

The Resiliency of SQL....84

Database Fundamentals....86

Types of Databases....88

Database Management System....91

“Speaking” with a Database....93

Creating and Managing Your Data Structures with DDL....94

Manipulating Data with DML....98

Inserting Data with INSERT....99

Selecting Data with SELECT....101

Updating Data with UPDATE....112

Deleting Data with DELETE....113

Storing Queries as Views....114

Common Table Expressions....117

Window Functions....121

SQL for Distributed Data Processing....125

Data Manipulation with DuckDB....129

Data Manipulation with Polars....133

Data Manipulation with FugueSQL....138

Bonus: Training Machine Learning Models with SQL....145

Summary....149

Chapter 4. Data Transformation with dbt....151

dbt Design Philosophy....152

dbt Data Flow....154

dbt Cloud....155

Setting Up dbt Cloud with BigQuery and GitHub....156

Using the dbt Cloud UI....169

Using the dbt Cloud IDE....179

Structure of a dbt Project....181

Jaffle Shop Database....184

YAML Files....184

Models....190

Sources....200

Tests....205

Analyses....213

Seeds....214

Documentation....216

dbt Commands and Selection Syntax....225

Jobs and Deployment....228

Summary....237

Chapter 5. dbt Advanced Topics....239

Model Materializations....239

Tables, Views, and Ephemeral Models....240

Incremental Models....243

Materialized Views....245

Snapshots....246

Dynamic SQL with Jinja....249

Using SQL Macros....252

dbt Packages....258

Installing Packages....258

Exploring the dbt_utils Package....260

Using Packages Inside Macros and Models....260

dbt Semantic Layer....262

Summary....266

Chapter 6. Building an End-to-End Analytics Engineering Use Case....269

Problem Definition: An Omnichannel Analytics Case....270

Operational Data Modeling....270

Conceptual Model....270

Logical Model....271

Physical Model....272

High-Level Data Architecture....276

Analytical Data Modeling....281

Identify the Business Processes....282

Identify Facts and Dimensions in the Dimensional Data Model....283

Identify the Attributes for Dimensions....285

Define the Granularity for Business Facts....286

Creating Our Data Warehouse with dbt....287

Tests, Documentation, and Deployment with dbt....296

Data Analytics with SQL....307

Conclusion....312

Index....313

About the Authors....322

Colophon....323

With the shift from data warehouses to data lakes, data now lands in repositories before it's been transformed, enabling engineers to model raw data into clean, well-defined datasets. dbt (data build tool) helps you take data further. This practical book shows data analysts, data engineers, BI developers, and data scientists how to create a true self-service transformation platform through the use of dynamic SQL.

Authors Rui Machado from Monstarlab and Hélder Russa from Jumia show you how to quickly deliver new data products by focusing more on value delivery and less on architectural and engineering aspects. If you know your business well and have the technical skills to model raw data into clean, well-defined datasets, you'll learn how to design and deliver data models without any technical influence.

With this book, you'll learn:

  • What dbt is and how a dbt project is structured
  • How dbt fits into the data engineering and analytics worlds
  • How to collaborate on building data models
  • The main tools and architectures for building useful, functional data models
  • How to fit dbt into data warehousing and laking architecture
  • How to build tests for data transformations



Похожее:

Список отзывов:

Нет отзывов к книге.