Foreword....6
Preface....9
Who This Book Is For....9
Who This Book Is Not For....11
What You’ll Learn....11
Python 3....12
License....14
How to Make an Attribution....14
Using Code Examples....15
Errata and Feedback....16
Conventions Used in This Book....16
O’Reilly Online Learning....17
How to Contact Us....18
Acknowledgments....19
1. Understanding Performant Python....21
The Fundamental Computer System....22
Computing Units....23
Memory Units....29
Communications Layers....33
Idealized Computing Versus the Python Virtual Machine....36
Idealized Computing....38
Python’s Virtual Machine....40
So Why Use Python?....45
How to Be a Highly Performant Programmer....49
Good Working Practices....51
Optimizing for the Team Rather than the Code Block....56
The Remote Performant Programmer....59
Some Thoughts on Good Notebook Practice....60
Your Work....62
The Future of Python....63
Where Did the GIL Go?....64
Does Python Have a JIT?....65
Wrap-Up....67
2. Profiling to Find Bottlenecks....69
Profiling Efficiently....70
Introducing the Julia Set....74
Calculating the Full Julia Set....80
Simple Approaches to Timing—print and a Decorator....86
Simple Timing Using the Unix time Command....93
Using the cProfile Module....96
Visualizing cProfile Output with SnakeViz....106
Using line_profiler for Line-by-Line Measurements....108
Using memory_profiler to Diagnose Memory Usage....119
Combining CPU and Memory Profiling with Scalene....129
Introspecting an Existing Process with PySpy....133
VizTracer for an Interactive Time-Based Call Stack....135
Bytecode: Under the Hood....139
Using the dis Module to Examine CPython Bytecode....140
Digging into Bytecode Specialization with Specialist....143
Different Approaches, Different Complexity....145
Unit Testing During Optimization to Maintain Correctness....150
No-op @profile Decorator....151
Strategies to Profile Your Code Successfully....157
Wrap-Up....160
3. Lists and Tuples....162
A More Efficient Search....167
Lists Versus Tuples....172
Lists as Dynamic Arrays....175
Tuples as Static Arrays....182
Wrap-Up....186
4. Dictionaries and Sets....188
How Do Dictionaries and Sets Work?....194
Inserting and Retrieving....195
Deletion....203
Resizing....203
Hash Functions and Entropy....206
Wrap-Up....215
5. Iterators and Generators....218
Iterators for Infinite Series....227
Lazy Generator Evaluation....230
Wrap-Up....238
6. Matrix and Vector Computation....240
Introduction to the Problem....241
Aren’t Python Lists Good Enough?....250
Problems with Allocating Too Much....253
Memory Fragmentation....259
Understanding perf....263
Making Decisions with perf’s Output....268
Enter numpy....270
Applying numpy to the Diffusion Problem....275
Memory Allocations and In-Place Operations....281
Selective Optimizations: Finding What Needs to Be Fixed....288
numexpr: Making In-Place Operations Faster and Easier....293
Graphics Processing Units (GPUs)....297
Dynamic Graphs: PyTorch....299
GPU Speed and Numerical Precision....304
GPU-Specific Operations....310
Basic GPU Profiling....315
Performance Considerations of GPUs....320
When to Use GPUs....323
Deep Learning Performance Considerations....326
A Cautionary Tale: Verify “Optimizations” (scipy)....334
Lessons from Matrix Optimizations....337
Wrap-Up....344
7. Pandas, Dask, and Polars....348
Pandas....350
Pandas’s Internal Model....351
Arrow and NumPy....354
Applying a Function to Many Rows of Data....356
Numba to Compile NumPy for Pandas....373
Building from Partial Results Rather than Concatenating....376
There’s More Than One (and Possibly a Faster) Way to Do a Job....378
Advice for Effective Pandas Development....381
Dask for Distributed Data Structures and DataFrames....384
Diagnostics....387
Parallel Pandas with Dask....389
Parallelized apply with Swifter on Dask....393
Polars for Fast DataFrames....395
Wrap-Up....397
8. Compiling to C....398
What Sort of Speed Gains Are Possible?....400
JIT Versus AOT Compilers....403
Why Does Type Information Help the Code Run Faster?....404
Using a C Compiler....406
Reviewing the Julia Set Example....407
Cython....408
Compiling a Pure Python Version Using Cython....409
pyximport....413
Cython Annotations to Analyze a Block of Code....414
Adding Some Type Annotations....419
Cython and numpy....427
Parallelizing the Solution with OpenMP on One Machine....432
Numba....436
PyPy....441
Garbage Collection Differences....443
Running PyPy and Installing Modules....444
A Summary of Speed Improvements....447
When to Use Each Technology....449
Foreign Function Interfaces....453
ctypes....455
cffi....460
f2py....465
CPython Extensions: C....471
CPython Extensions: Rust....479
Wrap-Up....487
9. Asynchronous I/O....489
Introduction to Asynchronous Programming....492
How Does async/await Work?....499
Serial Web Crawler....501
Asynchronous Web Crawler....505
Shared CPU–I/O Workload....514
Serial CPU Workload....515
Batched CPU Workload....518
Fully Asynchronous CPU Workload....524
Wrap-Up....531
10. The multiprocessing Module....534
An Overview of the multiprocessing Module....540
Estimating Pi Using the Monte Carlo Method....543
Estimating Pi Using Processes and Threads....546
Using Python Objects....546
Replacing multiprocessing with Joblib....561
Random Numbers in Parallel Systems....568
Using numpy....569
Finding Prime Numbers....574
Queues of Work....584
Asynchronously Adding Jobs to the Queue....591
Verifying Primes Using Interprocess Communication....594
Serial Solution....600
Naive Pool Solution....601
A Less Naive Pool Solution....604
Using manager.Value as a Flag....606
Using Redis as a Flag....610
Using RawValue as a Flag....615
Using mmap as a Flag....617
Using mmap as a Flag Redux....619
Sharing numpy Data with multiprocessing....623
Synchronizing File and Variable Access....637
File Locking....638
Locking a Value....646
Wrap-Up....652
11. Clusters and Job Queues....655
Benefits of Clustering....657
Drawbacks of Clustering....658
$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy....661
Skype’s 24-Hour Global Outage....662
Common Cluster Designs....664
How to Start a Clustered Solution....665
Ways to Avoid Pain When Using Clusters....667
Two Clustering Solutions....669
Using IPython Parallel to Support Research....670
Message Brokering for Cluster Efficiency....676
Other Clustering Tools to Look At....683
Docker....684
Docker’s Performance....685
Advantages of Docker....692
Wrap-Up....694
12. Using Less RAM....696
Objects for Primitives Are Expensive....698
The array Module Stores Many Primitive Objects Cheaply....701
Using Less RAM in NumPy with NumExpr....706
Understanding the RAM Used in a Collection....713
Bytes Versus Unicode....717
Efficiently Storing Lots of Text in RAM....719
Trying These Approaches on 11 Million Tokens....721
Modeling More Text with scikit-learn’s FeatureHasher....737
Introducing DictVectorizer and FeatureHasher....738
Comparing DictVectorizer and FeatureHasher on a Real Problem....743
SciPy’s Sparse Matrices....746
Tips for Using Less RAM....751
Probabilistic Data Structures....752
Very Approximate Counting with a 1-Byte Morris Counter....755
K-Minimum Values....761
Bloom Filters....767
LogLog Counter....778
Real-World Example....786
Wrap-Up....792
13. Lessons from the Field....795
Developing a High Performance Machine Learning Algorithm....796
High Performance Computing in Journalism....801
Lessons from the Field of Cyber Reinsurance....813
Python in Quant Finance....832
Maintain Flexibility to Achieve High Performance....839
Streamlining Feature Engineering Pipelines with Feature-engine (2020)....845
Highly Performant Data Science Teams (2020)....856
Numba (2020)....863
Optimizing Versus Thinking (2020)....876
Making Deep Learning Fly with RadimRehurek.com (2014)....882
Large-Scale Social Media Analysis at Smesh (2014)....891
Index....900
About the Authors....981
Your Python code may run correctly, but what if you need it to run faster? This practical book shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By explaining the fundamental theory behind design choices, this expanded edition of High Performance Python helps experienced Python programmers gain a deeper understanding of Python's implementation.
How do you take advantage of multicore architectures or compilation? Or build a system that scales up beyond RAM limits or with a GPU? Authors Micha Gorelick and Ian Ozsvald reveal concrete solutions to many issues and include war stories from companies that use high-performance Python for GenAI data extraction, productionized machine learning, and more.