DuckDB: Up and Running: Fast Data Analytics and Reporting

DuckDB: Up and Running: Fast Data Analytics and Reporting

DuckDB: Up and Running: Fast Data Analytics and Reporting
Автор: Lee Wei-Meng
Дата выхода: 2025
Издательство: O’Reilly Media, Inc.
Количество страниц: 308
Размер файла: 5,1 МБ
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Cover....1

Copyright....5

Table of Contents....6

Preface....10

Conventions Used in This Book....12

Using Code Examples....13

OReilly Online Learning....13

How to Contact Us....14

Acknowledgements....14

Chapter 1. Getting Started with DuckDB....16

Introduction to DuckDB....17

Why Use DuckDB?....17

High-Performance Analytical Queries....19

Versatile Integration and Ease of Use Across Multiple Programming Languages....21

Open Source....22

A Quick Look at DuckDB....22

Loading Data into DuckDB....23

Inserting a Record....24

Querying a Table....24

Performing Aggregation....25

Joining Tables....27

Reading Data from pandas....29

Why DuckDB Is More Efficient....32

Execution Speed....32

Memory Usage....35

Summary....36

Chapter 2. Importing Data into DuckDB....38

Creating DuckDB Databases....38

Loading Data from Different Data Sources and Formats....39

Working with CSV Files....39

Working with Parquet Files....49

Working with Excel Files....54

Working with MySQL....59

Summary....63

Chapter 3. A Primer on SQL....66

Using the DuckDB CLI....66

Importing Data into DuckDB....69

Dot Commands....70

Persisting the In-Memory Database on Disk....74

DuckDB SQL Primer....76

Creating a Database....77

Creating Tables....78

Viewing the Schemas of Tables....79

Dropping a Table....79

Working with Tables....80

Populating Tables with Rows....80

Updating Rows....83

Deleting Rows....83

Querying Tables....84

Joining Tables....85

Aggregating Data....91

Analytics....93

Summary....96

Chapter 4. Using DuckDB with Polars....98

Introduction to Polars....98

Creating a Polars DataFrame....99

Understanding Lazy Evaluation in Polars....108

Querying Polars DataFrames Using DuckDB....113

Using the sql() Function....113

Using the DuckDBPyRelation Object....118

Summary....122

Chapter 5. Performing EDA with DuckDB....124

Our Dataset: The 2015 Flight Delays Dataset....125

Geospatial Analysis....126

Displaying a Map....127

Displaying All Airports on the Map....129

Using the spatial Extension in DuckDB....132

Performing Descriptive Analytics....142

Finding the Airports for Each State and City....143

Aggregating the Total Number of Airports in Each State....146

Obtaining the Flight Counts for Each Pair of Origin and Destination Airports....151

Getting the Canceled Flights from Airlines....153

Getting the Flight Count for Each Day of the Week....159

Finding the Most Common Timeslot for Flight Delays....165

Finding the Airlines with the Most and Fewest Delays....168

Summary....173

Chapter 6. Using DuckDB with JSON Files....174

Primer on JSON....174

Object....175

String....175

Boolean....175

Number....176

Nested Object....176

Array....176

null....177

Loading JSON Files into DuckDB....178

Using the read_json_auto() Function....179

Using the read_json() Function....181

Using the COPY-FROM Statement....192

Exporting Tables to JSON....193

Summary....194

Chapter 7. Using DuckDB with JupySQL....196

What Is JupySQL?....197

Installing JupySQL....198

Loading the sql Extension....198

Integrating with DuckDB....199

Performing Queries....200

Storing Snippets....203

Visualization....205

Histograms....206

Box Plots....211

Pie Charts....213

Bar Plots....215

Integrating with MySQL....219

Using Environment Variables....219

Using an .ini File....222

Using keyring....224

Summary....225

Chapter 8. Accessing Remote Data Using DuckDB....226

DuckDBs httpfs Extension....226

Querying CSV and Parquet Files Remotely....227

Accessing CSV Files....227

Accessing Parquet Files....231

Querying Hugging Face Datasets....235

Using Hugging Face Datasets....236

Reading the Dataset Using hf: Paths....239

Accessing Files Within a Folder....240

Querying Multiple Files Using the Glob Syntax....243

Working with Private Hugging Face Datasets....246

Summary....258

Chapter 9. Using DuckDB in the Cloud with MotherDuck....260

Introduction to MotherDuck....261

Signing Up for MotherDuck....261

MotherDuck Plans....264

Getting Started with MotherDuck....265

Adding Tables....267

Creating Schemas....270

Sharing Databases....272

Creating a Database....278

Detaching a Database....278

Using the Databases in MotherDuck....279

Querying Your Database....279

Writing SQL Using AI....285

Using MotherDuck Through the DuckDB CLI....289

Connecting to MotherDuck....289

Querying Databases on MotherDuck....293

Creating Databases on MotherDuck....294

Performing Hybrid Queries....296

Summary....298

Index....300

About the Author....306

Colophon....306

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool.

Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions

  • Conduct data analytics tasks using DuckDB
  • Integrate DuckDB with pandas, Polars, and JupySQL
  • Use DuckDB to query your data
  • Perform spatial analytics using DuckDB's spatial extension
  • Work with a diverse range of data including Parquet, CSV, and JSON

Похожее:

Список отзывов:

Нет отзывов к книге.