The Art of Writing Efficient Programs....2
Contributors....3
About the author....3
About the reviewer....5
Preface....25
Who is this book for?....27
What this book covers....28
To get the most out of this book....31
Download the example code files....33
Download the color images....33
Conventions used....33
Get in touch....34
Share Your Thoughts....35
Section 1 – Performance Fundamentals....36
Chapter 1: Introduction to Performance and Concurrency....37
Why focus on performance?....38
Why performance matters....42
What is performance?....46
Performance as throughput....47
Performance as power consumption....49
Performance for real-time applications....50
Performance as dependent on context....52
Evaluating, estimating, and predicting performance....53
Learning about high performance....55
Summary....59
Questions....59
Chapter 2: Performance Measurements....61
Technical requirements....62
Performance measurements by example....63
Performance benchmarking....75
C++ chrono timers....76
High-resolution timers....77
Performance profiling....85
The perf profiler....87
Detailed profiling with perf....91
The Google Performance profiler....96
Profiling with call graphs....99
Optimization and inlining....104
Practical profiling....107
Micro-benchmarking....110
Basics of micro-benchmarking....111
Micro-benchmarking and compiler optimizations....115
Google Benchmark....121
Micro-benchmarks are lies....127
Summary....133
Questions....135
Chapter 3: CPU Architecture, Resources, and Performance....136
Technical requirements....136
The performance begins with the CPU....137
Probing performance with micro-benchmarks....140
Visualizing instruction-level parallelism....151
Data dependencies and pipelining....155
Pipelining and branches....164
Branch prediction....170
Profiling for branch mispredictions....175
Speculative execution....179
Optimization of complex conditions....181
Branchless computing....189
Loop unrolling....190
Branchless selection....191
Branchless computing examples....193
Summary....200
Questions....202
Chapter 4: Memory Architecture and Performance....203
Technical requirements....203
The performance begins with the CPU but does not end there....204
Measuring memory access speed....208
Memory architecture....210
Measuring memory and cache speeds....213
The speed of memory: the numbers....219
The speed of random memory access....219
The speed of sequential memory access....225
Memory performance optimizations in hardware....228
Optimizing memory performance....234
Memory-efficient data structures....234
Profiling memory performance....240
Optimizing algorithms for memory performance....244
The ghost in the machine....254
What is Spectre?....255
Spectre by example....259
Spectre, unleashed....265
Summary....271
Questions....272
Chapter 5: Threads, Memory, and Concurrency....274
Technical requirements....274
Understanding threads and concurrency....275
What is a thread?....276
Symmetric multi-threading....277
Threads and memory....279
Memory-bound programs and concurrency....285
Understanding the cost of memory synchronization....288
Why data sharing is expensive....296
Learning about concurrency and order....307
The need for order....308
Memory order and memory barriers....311
Memory order in C++....322
Memory model....326
Summary....333
Questions....334
Section 2 – Advanced Concurrency....336
Chapter 6: Concurrency and Performance....337
Technical requirements....338
What is needed to use concurrency effectively?....338
Locks, alternatives, and their performance....341
Lock-based, lock-free, and wait-free programs....345
Different locks for different problems....349
Lock-based versus lock-free, what is the real difference?....356
Building blocks for concurrent programming....360
The basics of concurrent data structures....362
Counters and accumulators....367
Publishing protocol....377
Smart pointers for concurrent programming....381
Summary....392
Questions....392
Chapter 7: Data Structures for Concurrency....394
Technical requirements....394
What is a thread-safe data structure?....395
The best kind of thread safety....395
The real thread safety....399
The thread-safe stack....400
Interface design for thread safety....401
Performance of mutex-guarded data structures....405
Performance requirements for different uses....408
Stack performance in detail....415
Performance estimates for synchronization schemes....420
Lock-free stack....425
The thread-safe queue....437
Lock-free queue....439
Non-sequentially consistent data structures....449
Memory management for concurrent data structures....454
The thread-safe list....458
Lock-free list....463
Summary....475
Questions....477
Chapter 8: Concurrency in C++....478
Technical requirements....478
Concurrency support in C++11....479
Concurrency support in C++17....481
Concurrency support in C++20....489
The foundations of coroutines....489
Coroutine C++ syntax....498
Coroutine examples....500
Summary....511
Questions....512
Section 3 – Designing and Coding High-Performance Programs....513
Chapter 9: High-Performance C++....514
Technical requirements....514
What is the efficiency of a programming language?....515
Unnecessary copying....517
Copying and argument passing....518
Copying as an implementation technique....521
Copying to store data....523
Copying of return values....525
Using pointers to avoid copying....531
How to avoid unnecessary copying....533
Inefficient memory management....535
Unnecessary memory allocations....536
Memory management in concurrent programs....542
Avoiding memory fragmentation....544
Optimization of conditional execution....550
Summary....554
Questions....555
Chapter 10: Compiler Optimizations in C++....557
Technical requirements....557
Compilers optimizing code....558
Basics of compiler optimizations....558
Function inlining....563
What does the compiler really know?....573
Lifting knowledge from runtime to compile time....584
Summary....588
Questions....589
Chapter 11: Undefined Behavior and Performance....590
Technical requirements....591
What is undefined behavior?....591
Why have undefined behavior?....597
Undefined behavior and C++ optimization....599
Using undefined behavior for efficient design....613
Summary....620
Questions....622
Chapter 12: Design for Performance....623
Technical requirements....624
Interaction between the design and performance....624
Design for performance....626
The minimum information principle....627
The maximum information principle....630
API design considerations....642
API design for concurrency....642
Copying and sending data....651
Design for optimal data access....655
Performance trade-offs....660
Interface design....661
Component design....663
Errors and undefined behavior....665
Making informed design decisions....668
Summary....672
Questions....673
Assessments....674
Chapter 1:....675
Chapter 2:....676
Chapter 3:....678
Chapter 4:....680
Chapter 5:....682
Chapter 6:....684
Chapter 7:....686
Chapter 8:....688
Chapter 9:....690
Chapter 10:....692
Chapter 11:....694
Chapter 12:....696
Why subscribe?....697
Other Books You May Enjoy....698
Packt is searching for authors like you....700
Share Your Thoughts....700
The great free lunch of "performance taking care of itself" is over. Until recently, programs got faster by themselves as CPUs were upgraded, but that doesn't happen anymore. The clock frequency of new processors has almost peaked, and while new architectures provide small improvements to existing programs, this only helps slightly. To write efficient software, you now have to know how to program by making good use of the available computing resources, and this book will teach you how to do that.
The Art of Efficient Programming covers all the major aspects of writing efficient programs, such as using CPU resources and memory efficiently, avoiding unnecessary computations, measuring performance, and how to put concurrency and multithreading to good use. You'll also learn about compiler optimizations and how to use the programming language (C++) more efficiently. Finally, you'll understand how design decisions impact performance.
By the end of this book, you'll not only have enough knowledge of processors and compilers to write efficient programs, but you'll also be able to understand which techniques to use and what to measure while improving performance. At its core, this book is about learning how to learn.
This book is for experienced developers and programmers who work on performance-critical projects and want to learn new techniques to improve the performance of their code. Programmers in algorithmic trading, gaming, bioinformatics, computational genomics, or computational fluid dynamics communities will get the most out of the examples in this book, but the techniques are fairly universal. Although this book uses the C++ language, the concepts demonstrated in the book can be easily transferred or applied to other compiled languages such as C, Java, Rust, Go, and more.