Foreword xvii
Preface xxi
Part I. Begin
1. Introducing Polars....3
What Is This Thing Called Polars?....4
Key Features....4
Key Concepts....4
Advantages....5
Why You Should Use Polars....5
Performance....6
Usability....6
Popularity....7
Sustainability....8
Polars Compared to Other Data Processing Packages....8
Why We Focus on Python Polars....10
How This Book Is Organized....10
An ETL Showcase....11
Extract....12
Bonus: Visualizing Neighborhoods and Stations....17
Transform....21
Bonus: Visualizing Daily Trips per Borough....26
Load....28
Bonus: Becoming Faster by Being Lazy....29
Takeaways....32
2. Getting Started....33
Setting Up Your Environment....33
Downloading the Project....34
Installing uv....35
Installing the Project....35
Working with the Virtual Environment....35
Verifying Your Installation....36
Crash Course in JupyterLab....37
Keyboard Shortcuts....38
Installing Polars on Other Projects....39
All Optional Dependencies....40
Optional Dependencies for Interoperability....40
Optional Dependencies for Working with Spreadsheets....40
Optional Dependencies for Working with Databases....41
Optional Dependencies for Working with Remote Filesystems....41
Optional Dependencies for Other I/O Formats....41
Optional Dependencies for Extra Functionality....42
Installing Optional Dependencies....42
Configuring Polars....42
Temporary Configuration Using a Context Manager....43
Local Configuration Using a Decorator....46
Compiling Polars from Scratch....46
Edge Case: Very Large Datasets....47
Edge Case: Processors Lacking AVX Support....48
Takeaways....48
3. Moving from pandas to Polars....49
Animals....50
Similarities to Recognize....50
Appearances to Appreciate....51
Differences in Code....51
Differences in Display....52
Concepts to Unlearn....57
Index....57
Axes....58
Indexing and Slicing....59
Eagerness....61
Relaxedness....63
Syntax to Forget....64
Common Operations Side By Side....65
To and From pandas....69
Takeaways....70
Part II. Form
4. Data Structures and Data Types....73
Series, DataFrames, and LazyFrames....73
Data Types....75
Nested Data Types....77
Missing Values....79
Data Type Conversion....84
Takeaways....86
5. Eager and Lazy APIs....87
Eager API: DataFrame....87
Lazy API: LazyFrame....89
Performance Differences....90
Functionality Differences....91
Attributes....92
Aggregation Methods....92
Computation Methods....93
Descriptive Methods....93
GroupBy Methods....94
Exporting Methods....94
Manipulation and Selection Methods....95
Miscellaneous Methods....97
Tips and Tricks....98
Going from LazyFrame to DataFrame and Vice Versa....98
Joining a DataFrame with a LazyFrame....99
Caching Intermittent Results....100
Takeaways....101
6. Reading and Writing Data....103
Format Overview....104
Reading CSV Files....105
Parsing Missing Values Correctly....107
Reading Files with Encodings Other Than UTF-8....108
Reading Excel Spreadsheets....110
Working with Multiple Files....111
Reading Parquet....114
Reading JSON and NDJSON....115
JSON....115
NDJSON....118
Other File Formats....120
Querying Databases....121
Writing Data....123
CSV Format....123
Excel Format....124
Parquet Format....124
Other Considerations....125
Takeaways....125
Part III. Express
7. Beginning Expressions....129
Methods and Namespaces....131
Expressions by Example....131
Selecting Columns with Expressions....132
Creating New Columns with Expressions....133
Filtering Rows with Expressions....135
Aggregating with Expressions....135
Sorting Rows with Expressions....136
The Definition of an Expression....137
Properties of Expressions....139
Creating Expressions....141
From Existing Columns....142
From Literal Values....143
From Ranges....145
Other Functions to Create Expressions....146
Renaming Expressions....147
Expressions Are Idiomatic....149
Takeaways....151
8. Continuing Expressions....153
Types of Operations....154
Example A: Element-Wise Operations....155
Example B: Operations That Summarize to One....155
Example C: Operations That Summarize to One or More....156
Example D: Operations That Extend....156
Element-Wise Operations....157
Operations That Perform Mathematical Transformations....157
Operations Related to Trigonometry....159
Operations That Round and Categorize....160
Operations for Missing or Infinite Values....161
Other Operations....163
Nonreducing Series-Wise Operations....164
Operations That Accumulate....164
Operations That Fill and Shift....166
Operations Related to Duplicate Values....167
Operations That Compute Rolling Statistics....168
Operations That Sort....170
Other Operations....171
Series-Wise Operations That Summarize to One....172
Operations That Are Quantifiers....173
Operations That Compute Statistics....174
Operations That Count....176
Other Operations....178
Series-Wise Operations That Summarize to One or More....179
Operations Related to Unique Values....179
Operations That Select....180
Operations That Drop Missing Values....181
Other Operations....182
Series-Wise Operations That Extend....185
Takeaways....185
9. Combining Expressions....187
Inline Operators Versus Methods....188
Arithmetic Operations....190
Comparison Operations....191
Boolean Algebra Operations....195
Bitwise Operations....197
Using Functions....199
When, Then, Otherwise....202
Takeaways....204
Part IV. Transform
10. Selecting and Creating Columns....209
Selecting Columns....211
Introducing Selectors....212
Selecting Based on Name....213
Selecting Based on Data Type....214
Selecting Based on Position....216
Combining Selectors....218
Creating Columns....220
Related Column Operations....225
Dropping....225
Renaming....225
Stacking....226
Adding Row Indices....227
Takeaways....227
11. Filtering and Sorting Rows....229
Filtering Rows....230
Filtering Based on Expressions....230
Filtering Based on Column Names....231
Filtering Based on Constraints....232
Sorting Rows....233
Sorting Based on a Single Column....234
Sorting in Reverse....235
Sorting Based on Multiple Columns....235
Sorting Based on Expressions....236
Sorting Nested Data Types....237
Related Row Operations....239
Filtering Missing Values....239
Slicing....240
Top and Bottom....241
Sampling....241
Semi-Joins....241
Takeaways....242
12. Working with Textual, Temporal, and Nested Data Types....245
String....246
String Methods....246
String Examples....248
Categorical....252
Categorical Methods....253
Categorical Examples....253
Enum....256
Temporal....257
Temporal Methods....257
Temporal Examples....259
List....263
List Methods....263
List Examples....265
Array....267
Array Methods....267
Array Examples....268
Struct....270
Struct Methods....270
Struct Examples....271
Takeaways....274
13. Summarizing and Aggregating....275
Split, Apply, and Combine....276
GroupBy Context....276
The Descriptives....279
Advanced Methods....284
Row-Wise Aggregations....289
Window Functions in Selection Context....291
Dynamic Grouping....293
Rolling Aggregations....294
Upsampling....297
Takeaways....299
14. Joining and Concatenating....301
Joining....301
Join Strategies....302
Joining on Multiple Columns....306
Validation....306
Inexact Joining....308
Inexact Join Strategies....310
Additional Fine-Tuning....312
Use Case: Marketing Campaign Attribution....312
Vertical and Horizontal Concatenation....316
Vertical....317
Horizontal....318
Diagonal....318
Align....319
Relaxed....322
Stacking....323
Appending....324
Extending....324
Takeaways....325
15. Reshaping....327
Wide Versus Long DataFrames....327
Pivot to a Wider DataFrame....330
Unpivot to a Longer DataFrame....335
Transposing....337
Exploding....339
Partition into Multiple DataFrames....342
Takeaways....345
Part V. Advance
16. Visualizing Data....349
NYC Bike Trips....351
Built-In Plotting with Altair....353
Introducing Altair....353
Methods in the Plot Namespaces....354
Plotting DataFrames....355
Too Large to Handle....357
Plotting Series....359
pandas-Like Plotting with hvPlot....363
Introducing hvPlot....363
A First Plot....364
Methods in the hvPlot Namespace....365
pandas as Backup....366
Manual Transformations....367
Changing the Plotting Backend....368
Plotting Points on a Map....369
Composing Plots....369
Adding Interactive Widgets....371
Publication-Quality Graphics with plotnine....372
Introducing plotnine....373
Plots for Exploration....373
Plots for Communication....377
Styling DataFrames With Great Tables....381
Takeaways....386
17. Extending Polars....387
User-Defined Functions in Python....387
Applying a Function to Elements....388
Applying a Function to a Series....390
Applying a Function to Groups....391
Applying a Function to an Expression....394
Applying a Function to a DataFrame or LazyFrame....395
Registering Your Own Namespace....396
Polars Plugins in Rust....397
Prerequisites....398
The Anatomy of a Plugin Project....398
The Plugin....398
Compiling the Plugin....401
Performance Benchmark....401
Register Arguments....402
Using a Rust Crate....405
Use Case: geo....405
Takeaways....416
18. Polars Internals....417
Polars’ Architecture....417
Arrow....419
Multithreaded Computations and SIMD Operations....421
The String Data Type in Memory....422
ChunkedArrays in Series....423
Query Optimization....424
LazyFrame Scan-Level Optimizations....425
Other Optimizations....427
Checking Your Expressions....429
meta Namespace Overview....429
meta Namespace Examples....430
Profiling Polars....432
Tests in Polars....434
Comparing DataFrames and Series....435
Common Antipatterns....437
Using Brackets for Column Selection....437
Misusing Collect....437
Using Python Code in your Polars Queries....438
Takeaways....439
Appendix: Accelerating Polars with the GPU....441
Index....461
Unlock the power of Polars, a Python package for transforming, analyzing, and visualizing data. In this hands-on guide, Jeroen Janssens and Thijs Nieuwdorp walk you through every feature of Polars, showing you how to use it for real-world tasks like data wrangling, exploratory data analysis, building pipelines, and more.
Whether you're a seasoned data professional or new to data science, you'll quickly master Polars' expressive API and its underlying concepts. You don't need to have experience with pandas, but if you do, this book will help you make a seamless transition. The many practical examples and real-world datasets are available on GitHub, so you can easily follow along.
Process data from CSV, Parquet, spreadsheets, databases, and the cloud
Get a solid understanding of Expressions, the building blocks of every query
Handle complex data types, including text, time, and nested structures
Use both eager and lazy APIs, and know when to use each
Visualize your data with Altair, hvPlot, plotnine, and Great Tables
Extend Polars with your own Python functions and Rust plugins
Leverage GPU acceleration to boost performance even further