Python Polars: The Definitive Guide: Transforming, Analyzing, and Visualizing Data with a Fast and Expressive DataFrame API

Python Polars: The Definitive Guide: Transforming, Analyzing, and Visualizing Data with a Fast and Expressive DataFrame API

Python Polars: The Definitive Guide: Transforming, Analyzing, and Visualizing Data with a Fast and Expressive DataFrame API
Автор: Janssens Jeroen , Nieuwdorp Thijs
Дата выхода: 2025
Издательство: O’Reilly Media, Inc.
Количество страниц: 504
Размер файла: 2,9 МБ
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы  Дополнительные материалы 

Foreword xvii
Preface xxi
Part I. Begin
1. Introducing Polars 3
What Is This Thing Called Polars? 4
Key Features 4
Key Concepts 4
Advantages 5
Why You Should Use Polars 5
Performance 6
Usability 6
Popularity 7
Sustainability 8
Polars Compared to Other Data Processing Packages 8
Why We Focus on Python Polars 10
How This Book Is Organized 10
An ETL Showcase 11
Extract 12
Bonus: Visualizing Neighborhoods and Stations 17
Transform 21
Bonus: Visualizing Daily Trips per Borough 26
Load 28
Bonus: Becoming Faster by Being Lazy 29
Takeaways 32
2. Getting Started 33
Setting Up Your Environment 33
Downloading the Project 34
Installing uv 35
Installing the Project 35
Working with the Virtual Environment 35
Verifying Your Installation 36
Crash Course in JupyterLab 37
Keyboard Shortcuts 38
Installing Polars on Other Projects 39
All Optional Dependencies 40
Optional Dependencies for Interoperability 40
Optional Dependencies for Working with Spreadsheets 40
Optional Dependencies for Working with Databases 41
Optional Dependencies for Working with Remote Filesystems 41
Optional Dependencies for Other I/O Formats 41
Optional Dependencies for Extra Functionality 42
Installing Optional Dependencies 42
Configuring Polars 42
Temporary Configuration Using a Context Manager 43
Local Configuration Using a Decorator 46
Compiling Polars from Scratch 46
Edge Case: Very Large Datasets 47
Edge Case: Processors Lacking AVX Support 48
Takeaways 48
3. Moving from pandas to Polars 49
Animals 50
Similarities to Recognize 50
Appearances to Appreciate 51
Differences in Code 51
Differences in Display 52
Concepts to Unlearn 57
Index 57
Axes 58
Indexing and Slicing 59
Eagerness 61
Relaxedness 63
Syntax to Forget 64
Common Operations Side By Side 65
To and From pandas 69
Takeaways 70
Part II. Form
4. Data Structures and Data Types 73
Series, DataFrames, and LazyFrames 73
Data Types 75
Nested Data Types 77
Missing Values 79
Data Type Conversion 84
Takeaways 86
5. Eager and Lazy APIs 87
Eager API: DataFrame 87
Lazy API: LazyFrame 89
Performance Differences 90
Functionality Differences 91
Attributes 92
Aggregation Methods 92
Computation Methods 93
Descriptive Methods 93
GroupBy Methods 94
Exporting Methods 94
Manipulation and Selection Methods 95
Miscellaneous Methods 97
Tips and Tricks 98
Going from LazyFrame to DataFrame and Vice Versa 98
Joining a DataFrame with a LazyFrame 99
Caching Intermittent Results 100
Takeaways 101
6. Reading and Writing Data 103
Format Overview 104
Reading CSV Files 105
Parsing Missing Values Correctly 107
Reading Files with Encodings Other Than UTF-8 108
Reading Excel Spreadsheets 110
Working with Multiple Files 111
Reading Parquet 114
Reading JSON and NDJSON 115
JSON 115
NDJSON 118
Other File Formats 120
Querying Databases 121
Writing Data 123
CSV Format 123
Excel Format 124
Parquet Format 124
Other Considerations 125
Takeaways 125
Part III. Express
7. Beginning Expressions 129
Methods and Namespaces 131
Expressions by Example 131
Selecting Columns with Expressions 132
Creating New Columns with Expressions 133
Filtering Rows with Expressions 135
Aggregating with Expressions 135
Sorting Rows with Expressions 136
The Definition of an Expression 137
Properties of Expressions 139
Creating Expressions 141
From Existing Columns 142
From Literal Values 143
From Ranges 145
Other Functions to Create Expressions 146
Renaming Expressions 147
Expressions Are Idiomatic 149
Takeaways 151
8. Continuing Expressions 153
Types of Operations 154
Example A: Element-Wise Operations 155
Example B: Operations That Summarize to One 155
Example C: Operations That Summarize to One or More 156
Example D: Operations That Extend 156
Element-Wise Operations 157
Operations That Perform Mathematical Transformations 157
Operations Related to Trigonometry 159
Operations That Round and Categorize 160
Operations for Missing or Infinite Values 161
Other Operations 163
Nonreducing Series-Wise Operations 164
Operations That Accumulate 164
Operations That Fill and Shift 166
Operations Related to Duplicate Values 167
Operations That Compute Rolling Statistics 168
Operations That Sort 170
Other Operations 171
Series-Wise Operations That Summarize to One 172
Operations That Are Quantifiers 173
Operations That Compute Statistics 174
Operations That Count 176
Other Operations 178
Series-Wise Operations That Summarize to One or More 179
Operations Related to Unique Values 179
Operations That Select 180
Operations That Drop Missing Values 181
Other Operations 182
Series-Wise Operations That Extend 185
Takeaways 185
9. Combining Expressions 187
Inline Operators Versus Methods 188
Arithmetic Operations 190
Comparison Operations 191
Boolean Algebra Operations 195
Bitwise Operations 197
Using Functions 199
When, Then, Otherwise 202
Takeaways 204
Part IV. Transform
10. Selecting and Creating Columns 209
Selecting Columns 211
Introducing Selectors 212
Selecting Based on Name 213
Selecting Based on Data Type 214
Selecting Based on Position 216
Combining Selectors 218
Creating Columns 220
Related Column Operations 225
Dropping 225
Renaming 225
Stacking 226
Adding Row Indices 227
Takeaways 227
11. Filtering and Sorting Rows 229
Filtering Rows 230
Filtering Based on Expressions 230
Filtering Based on Column Names 231
Filtering Based on Constraints 232
Sorting Rows 233
Sorting Based on a Single Column 234
Sorting in Reverse 235
Sorting Based on Multiple Columns 235
Sorting Based on Expressions 236
Sorting Nested Data Types 237
Related Row Operations 239
Filtering Missing Values 239
Slicing 240
Top and Bottom 241
Sampling 241
Semi-Joins 241
Takeaways 242
12. Working with Textual, Temporal, and Nested Data Types 245
String 246
String Methods 246
String Examples 248
Categorical 252
Categorical Methods 253
Categorical Examples 253
Enum 256
Temporal 257
Temporal Methods 257
Temporal Examples 259
List 263
List Methods 263
List Examples 265
Array 267
Array Methods 267
Array Examples 268
Struct 270
Struct Methods 270
Struct Examples 271
Takeaways 274
13. Summarizing and Aggregating 275
Split, Apply, and Combine 276
GroupBy Context 276
The Descriptives 279
Advanced Methods 284
Row-Wise Aggregations 289
Window Functions in Selection Context 291
Dynamic Grouping 293
Rolling Aggregations 294
Upsampling 297
Takeaways 299
14. Joining and Concatenating 301
Joining 301
Join Strategies 302
Joining on Multiple Columns 306
Validation 306
Inexact Joining 308
Inexact Join Strategies 310
Additional Fine-Tuning 312
Use Case: Marketing Campaign Attribution 312
Vertical and Horizontal Concatenation 316
Vertical 317
Horizontal 318
Diagonal 318
Align 319
Relaxed 322
Stacking 323
Appending 324
Extending 324
Takeaways 325
15. Reshaping 327
Wide Versus Long DataFrames 327
Pivot to a Wider DataFrame 330
Unpivot to a Longer DataFrame 335
Transposing 337
Exploding 339
Partition into Multiple DataFrames 342
Takeaways 345
Part V. Advance
16. Visualizing Data 349
NYC Bike Trips 351
Built-In Plotting with Altair 353
Introducing Altair 353
Methods in the Plot Namespaces 354
Plotting DataFrames 355
Too Large to Handle 357
Plotting Series 359
pandas-Like Plotting with hvPlot 363
Introducing hvPlot 363
A First Plot 364
Methods in the hvPlot Namespace 365
pandas as Backup 366
Manual Transformations 367
Changing the Plotting Backend 368
Plotting Points on a Map 369
Composing Plots 369
Adding Interactive Widgets 371
Publication-Quality Graphics with plotnine 372
Introducing plotnine 373
Plots for Exploration 373
Plots for Communication 377
Styling DataFrames With Great Tables 381
Takeaways 386
17. Extending Polars 387
User-Defined Functions in Python 387
Applying a Function to Elements 388
Applying a Function to a Series 390
Applying a Function to Groups 391
Applying a Function to an Expression 394
Applying a Function to a DataFrame or LazyFrame 395
Registering Your Own Namespace 396
Polars Plugins in Rust 397
Prerequisites 398
The Anatomy of a Plugin Project 398
The Plugin 398
Compiling the Plugin 401
Performance Benchmark 401
Register Arguments 402
Using a Rust Crate 405
Use Case: geo 405
Takeaways 416
18. Polars Internals 417
Polars’ Architecture 417
Arrow 419
Multithreaded Computations and SIMD Operations 421
The String Data Type in Memory 422
ChunkedArrays in Series 423
Query Optimization 424
LazyFrame Scan-Level Optimizations 425
Other Optimizations 427
Checking Your Expressions 429
meta Namespace Overview 429
meta Namespace Examples 430
Profiling Polars 432
Tests in Polars 434
Comparing DataFrames and Series 435
Common Antipatterns 437
Using Brackets for Column Selection 437
Misusing Collect 437
Using Python Code in your Polars Queries 438
Takeaways 439
Appendix: Accelerating Polars with the GPU 441
Index 461

Unlock the power of Polars, a Python package for transforming, analyzing, and visualizing data. In this hands-on guide, Jeroen Janssens and Thijs Nieuwdorp walk you through every feature of Polars, showing you how to use it for real-world tasks like data wrangling, exploratory data analysis, building pipelines, and more.

 

Whether you're a seasoned data professional or new to data science, you'll quickly master Polars' expressive API and its underlying concepts. You don't need to have experience with pandas, but if you do, this book will help you make a seamless transition. The many practical examples and real-world datasets are available on GitHub, so you can easily follow along.

  • Process data from CSV, Parquet, spreadsheets, databases, and the cloud

  • Get a solid understanding of Expressions, the building blocks of every query

  • Handle complex data types, including text, time, and nested structures

  • Use both eager and lazy APIs, and know when to use each

  • Visualize your data with Altair, hvPlot, plotnine, and Great Tables

  • Extend Polars with your own Python functions and Rust plugins

  • Leverage GPU acceleration to boost performance even further


Похожее:

Список отзывов:

Нет отзывов к книге.