DuckDB: Up and Running: Fast Data Analytics and Reporting

DuckDB: Up and Running: Fast Data Analytics and Reporting

DuckDB: Up and Running: Fast Data Analytics and Reporting
Автор: Lee Wei-Meng
Дата выхода: 2025
Издательство: O’Reilly Media, Inc.
Количество страниц: 308
Размер файла: 5.1 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Preface ix
1. Getting Started with DuckDB....1
Introduction to DuckDB....2
Why Use DuckDB?....2
High-Performance Analytical Queries....4
Versatile Integration and Ease of Use Across
Multiple Programming Languages....6
Open Source....7
A Quick Look at DuckDB....7
Loading Data into DuckDB....8
Inserting a Record....9
Querying a Table....9
Performing Aggregation....10
Joining Tables....12
Reading Data from pandas....14
Why DuckDB Is More Efficient....17
Execution Speed....17
Memory Usage....20
Summary....21
2. Importing Data into DuckDB....23
Creating DuckDB Databases....23
Loading Data from Different Data Sources and Formats....24
Working with CSV Files....24
Working with Parquet Files....34
Working with Excel Files....39
Working with MySQL....44
Summary....48
3. A Primer on SQL....51
Using the DuckDB CLI....51
Importing Data into DuckDB....54
Dot Commands....55
Persisting the In-Memory Database on Disk....59
DuckDB SQL Primer....61
Creating a Database....62
Creating Tables....63
Viewing the Schemas of Tables....64
Dropping a Table....64
Working with Tables....65
Populating Tables with Rows....65
Updating Rows....68
Deleting Rows....68
Querying Tables....69
Joining Tables....70
Aggregating Data....76
Analytics....78
Summary....81
4. Using DuckDB with Polars....83
Introduction to Polars....83
Creating a Polars DataFrame....84
Understanding Lazy Evaluation in Polars....93
Querying Polars DataFrames Using DuckDB....98
Using the sql() Function....98
Using the DuckDBPyRelation Object....103
Summary....107
5. Performing EDA with DuckDB....109
Our Dataset: The 2015 Flight Delays Dataset....110
Geospatial Analysis....111
Displaying a Map....112
Displaying All Airports on the Map....114
Using the spatial Extension in DuckDB....117
Performing Descriptive Analytics....127
Finding the Airports for Each State and City....128
Aggregating the Total Number of Airports in Each State....131
Obtaining the Flight Counts for Each Pair of Origin and
Destination Airports....136
Getting the Canceled Flights from Airlines....138
Getting the Flight Count for Each Day of the Week....144
Finding the Most Common Timeslot for Flight Delays....150
Finding the Airlines with the Most and Fewest Delays....153
Summary....158
6. Using DuckDB with JSON Files....159
Primer on JSON....159
Object....160
String....160
Boolean....160
Number....161
Nested Object....161
Array....161
null....162
Loading JSON Files into DuckDB....163
Using the read_json_auto() Function....164
Using the read_json() Function....166
Using the COPY-FROM Statement....177
Exporting Tables to JSON....178
Summary....179
7. Using DuckDB with JupySQL....181
What Is JupySQL?....182
Installing JupySQL....183
Loading the sql Extension....183
Integrating with DuckDB....184
Performing Queries....185
Storing Snippets....188
Visualization....190
Histograms....191
Box Plots....196
Pie Charts....198
Bar Plots....200
Integrating with MySQL....204
Using Environment Variables....204
Using an .ini File....207
Using keyring....209
Summary....210
8. Accessing Remote Data Using DuckDB....211
DuckDB’s httpfs Extension....211
Querying CSV and Parquet Files Remotely....212
Accessing CSV Files....212
Table of Contents | vii
Accessing Parquet Files....216
Querying Hugging Face Datasets....220
Using Hugging Face Datasets....221
Reading the Dataset Using hf:// Paths....224
Accessing Files Within a Folder....225
Querying Multiple Files Using the Glob Syntax....228
Working with Private Hugging Face Datasets....231
Summary....243
9. Using DuckDB in the Cloud with MotherDuck....245
Introduction to MotherDuck....246
Signing Up for MotherDuck....246
MotherDuck Plans....249
Getting Started with MotherDuck....250
Adding Tables....252
Creating Schemas....255
Sharing Databases....257
Creating a Database....263
Detaching a Database....263
Using the Databases in MotherDuck....264
Querying Your Database....264
Writing SQL Using AI....270
Using MotherDuck Through the DuckDB CLI....274
Connecting to MotherDuck....274
Querying Databases on MotherDuck....278
Creating Databases on MotherDuck....279
Performing Hybrid Queries....281
Summary....283
Index....285

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool.

Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions

  • Conduct data analytics tasks using DuckDB

  • Integrate DuckDB with pandas, Polars, and JupySQL

  • Use DuckDB to query your data

  • Perform spatial analytics using DuckDB's spatial extension

  • Work with a diverse range of data including Parquet, CSV, and JSON


Похожее:

Список отзывов:

Нет отзывов к книге.