Cover....1
Copyright....5
Table of Contents....6
Preface....10
Conventions Used in This Book....12
Using Code Examples....13
OReilly Online Learning....13
How to Contact Us....14
Acknowledgements....14
Chapter 1. Getting Started with DuckDB....16
Introduction to DuckDB....17
Why Use DuckDB?....17
High-Performance Analytical Queries....19
Versatile Integration and Ease of Use Across Multiple Programming Languages....21
Open Source....22
A Quick Look at DuckDB....22
Loading Data into DuckDB....23
Inserting a Record....24
Querying a Table....24
Performing Aggregation....25
Joining Tables....27
Reading Data from pandas....29
Why DuckDB Is More Efficient....32
Execution Speed....32
Memory Usage....35
Summary....36
Chapter 2. Importing Data into DuckDB....38
Creating DuckDB Databases....38
Loading Data from Different Data Sources and Formats....39
Working with CSV Files....39
Working with Parquet Files....49
Working with Excel Files....54
Working with MySQL....59
Summary....63
Chapter 3. A Primer on SQL....66
Using the DuckDB CLI....66
Importing Data into DuckDB....69
Dot Commands....70
Persisting the In-Memory Database on Disk....74
DuckDB SQL Primer....76
Creating a Database....77
Creating Tables....78
Viewing the Schemas of Tables....79
Dropping a Table....79
Working with Tables....80
Populating Tables with Rows....80
Updating Rows....83
Deleting Rows....83
Querying Tables....84
Joining Tables....85
Aggregating Data....91
Analytics....93
Summary....96
Chapter 4. Using DuckDB with Polars....98
Introduction to Polars....98
Creating a Polars DataFrame....99
Understanding Lazy Evaluation in Polars....108
Querying Polars DataFrames Using DuckDB....113
Using the sql() Function....113
Using the DuckDBPyRelation Object....118
Summary....122
Chapter 5. Performing EDA with DuckDB....124
Our Dataset: The 2015 Flight Delays Dataset....125
Geospatial Analysis....126
Displaying a Map....127
Displaying All Airports on the Map....129
Using the spatial Extension in DuckDB....132
Performing Descriptive Analytics....142
Finding the Airports for Each State and City....143
Aggregating the Total Number of Airports in Each State....146
Obtaining the Flight Counts for Each Pair of Origin and Destination Airports....151
Getting the Canceled Flights from Airlines....153
Getting the Flight Count for Each Day of the Week....159
Finding the Most Common Timeslot for Flight Delays....165
Finding the Airlines with the Most and Fewest Delays....168
Summary....173
Chapter 6. Using DuckDB with JSON Files....174
Primer on JSON....174
Object....175
String....175
Boolean....175
Number....176
Nested Object....176
Array....176
null....177
Loading JSON Files into DuckDB....178
Using the read_json_auto() Function....179
Using the read_json() Function....181
Using the COPY-FROM Statement....192
Exporting Tables to JSON....193
Summary....194
Chapter 7. Using DuckDB with JupySQL....196
What Is JupySQL?....197
Installing JupySQL....198
Loading the sql Extension....198
Integrating with DuckDB....199
Performing Queries....200
Storing Snippets....203
Visualization....205
Histograms....206
Box Plots....211
Pie Charts....213
Bar Plots....215
Integrating with MySQL....219
Using Environment Variables....219
Using an .ini File....222
Using keyring....224
Summary....225
Chapter 8. Accessing Remote Data Using DuckDB....226
DuckDBs httpfs Extension....226
Querying CSV and Parquet Files Remotely....227
Accessing CSV Files....227
Accessing Parquet Files....231
Querying Hugging Face Datasets....235
Using Hugging Face Datasets....236
Reading the Dataset Using hf: Paths....239
Accessing Files Within a Folder....240
Querying Multiple Files Using the Glob Syntax....243
Working with Private Hugging Face Datasets....246
Summary....258
Chapter 9. Using DuckDB in the Cloud with MotherDuck....260
Introduction to MotherDuck....261
Signing Up for MotherDuck....261
MotherDuck Plans....264
Getting Started with MotherDuck....265
Adding Tables....267
Creating Schemas....270
Sharing Databases....272
Creating a Database....278
Detaching a Database....278
Using the Databases in MotherDuck....279
Querying Your Database....279
Writing SQL Using AI....285
Using MotherDuck Through the DuckDB CLI....289
Connecting to MotherDuck....289
Querying Databases on MotherDuck....293
Creating Databases on MotherDuck....294
Performing Hybrid Queries....296
Summary....298
Index....300
About the Author....306
Colophon....306
DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool.
Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions