Cover....1
Title Page....5
Copyright....6
Dedication....7
About the Author....9
About the Technical Reviewer....9
Brief Contents....11
Contents in Detail....13
Acknowledgments ....25
Introduction....27
Who This Book Is For....28
Why Write a C Compiler?....29
Compilation from 10,000 Feet....29
What You’ll Build....30
How to Use This Book....34
The Test Suite....34
Extra Credit Features....34
Some Advice on Choosing an Implementation Language....35
System Requirements....36
Installing GCC and GDB on Linux....36
Installing the Command Line Developer Tools on macOS....37
Running on Apple Silicon....37
Validating Your Setup....37
Additional Resources....38
Let’s Go!....39
Part I: The Basics....41
1. A Minimal Compiler....43
The Four Compiler Passes....43
Hello, Assembly!....44
The Compiler Driver....47
The Lexer....48
The Parser....50
An Example Abstract Syntax Tree....51
The AST Definition....53
The Formal Grammar....54
Recursive Descent Parsing....55
Assembly Generation....57
Code Emission....59
Summary....61
Additional Resources....61
2. Unary Operators....65
Negation and Bitwise Complement in Assembly....66
The Stack....67
The Lexer....71
The Parser....73
TACKY: A New Intermediate Representation....75
Defining TACKY....76
Generating TACKY....77
Generating Names for Temporary Variables....78
Updating the Compiler Driver....78
Assembly Generation....79
Converting TACKY to Assembly....79
Replacing Pseudoregisters....82
Fixing Up Instructions....82
Code Emission....83
Summary....85
Additional Resources....85
3. Binary Operators....87
The Lexer....88
The Parser....88
The Trouble with Recursive Descent Parsing....90
The Adequate Solution: Refactoring the Grammar....91
The Better Solution: Precedence Climbing....91
Precedence Climbing in Action....95
TACKY Generation....98
Assembly Generation....100
Doing Arithmetic in Assembly....100
Converting Binary Operations to Assembly....101
Replacing Pseudoregisters....104
Fixing Up the idiv, add, sub, and imul Instructions....104
Code Emission....105
Extra Credit: Bitwise Operators....107
Summary....107
Additional Resources....108
4. Logical and Relational Operators....111
Short-Circuiting Operators....112
The Lexer....112
The Parser....113
TACKY Generation....115
Adding Jumps, Copies, and Comparisons to the TACKY IR....115
Converting Short-Circuiting Operators to TACKY....116
Generating Labels....117
Comparisons and Jumps in Assembly....118
Comparisons and Status Flags....118
Conditional Set Instructions....122
Jump Instructions....123
Assembly Generation....125
Replacing Pseudoregisters....127
Fixing Up the cmp Instruction....128
Code Emission....128
Summary....130
Additional Resources....131
5. Local Variables....133
Variables, Declarations, and Assignment....134
The Lexer....137
The Parser....137
The Updated AST and Grammar....137
An Improved Precedence Climbing Algorithm....141
Semantic Analysis....143
Variable Resolution....145
The --validate Option....149
TACKY Generation....149
Variable and Assignment Expressions....150
Declarations, Statements, and Function Bodies....150
Functions with No return Statement....151
Extra Credit: Compound Assignment, Increment, and Decrement....153
Summary....154
6. If Statements and Conditional Expressions....157
The Lexer....158
The Parser....158
Parsing if Statements....158
Parsing Conditional Expressions....161
Variable Resolution....165
TACKY Generation....166
Converting if Statements to TACKY....166
Converting Conditional Expressions to TACKY....167
Extra Credit: Labeled Statements and goto....168
Summary....168
7. Compound Statements....171
The Scoop on Scopes....172
The Parser....175
Variable Resolution....176
Resolving Variables in Multiple Scopes....177
Updating the Variable Resolution Pseudocode....178
TACKY Generation....180
Summary....180
8. Loops....183
Loops and How to Escape Them....184
The Lexer....188
The Parser....189
Semantic Analysis....191
Extending Variable Resolution....191
Loop Labeling....192
Implementing Loop Labeling....194
TACKY Generation....195
break and continue Statements....195
do Loops....196
while Loops....197
for Loops....197
Extra Credit: switch Statements....199
Summary....199
9. Functions....201
Declaring, Defining, and Calling Functions....202
Declarations and Definitions....202
Function Calls....205
Identifier Linkage....206
Compiling Libraries....209
The Lexer....210
The Parser....210
Semantic Analysis....214
Extending Identifier Resolution....215
Writing the Type Checker....218
TACKY Generation....222
Assembly Generation....223
Understanding Calling Conventions....224
Calling Functions with the System V ABI....225
Converting Function Calls and Definitions to Assembly....234
Replacing Pseudoregisters....240
Allocating Stack Space During Instruction Fix-Up....240
Code Emission....241
Calling Library Functions....243
Summary....245
10. File Scope Variable Declarations and Storage-Class Specifiers....247
All About Declarations....248
Scope....248
Linkage....249
Storage Duration....252
Definitions vs. Declarations....254
Error Cases....257
Linkage and Storage Duration in Assembly....260
The Lexer....263
The Parser....264
Parsing Type and Storage-Class Specifiers....265
Distinguishing Between Function and Variable Declarations....266
Semantic Analysis....267
Identifier Resolution: Resolving External Variables....267
Type Checking: Tracking Static Functions and Variables....269
TACKY Generation....274
Assembly Generation....275
Generating Assembly for Variable Definitions....276
Replacing Pseudoregisters According to Their Storage Duration....277
Fixing Up Instructions....277
Code Emission....278
Summary....280
Part II: Types Beyond Int....281
11. Long Integers....283
Long Integers in Assembly....284
Type Conversions....284
Static Long Variables....286
The Lexer....287
The Parser....287
Semantic Analysis....291
Adding Type Information to the AST....292
Type Checking Expressions....293
Type Checking return Statements....296
Type Checking Declarations and Updating the Symbol Table....297
TACKY Generation....298
Tracking the Types of Temporary Variables....300
Generating Extra Return Instructions....301
Assembly Generation....301
Tracking Assembly Types in the Backend Symbol Table....306
Replacing Longword and Quadword Pseudoregisters....307
Fixing Up Instructions....307
Code Emission....309
Summary....311
12. Unsigned Integers....313
Type Conversions, Again....314
Converting Between Signed and Unsigned Types of the Same Size....314
Converting unsigned int to a Larger Type....314
Converting signed int to a Larger Type....315
Converting from Larger to Smaller Types....315
The Lexer....315
The Parser....316
The Type Checker....319
TACKY Generation....321
Unsigned Integer Operations in Assembly....323
Unsigned Comparisons....323
Unsigned Division....326
Zero Extension....326
Assembly Generation....327
Replacing Pseudoregisters....329
Fixing Up the Div and MovZeroExtend Instructions....330
Code Emission....330
Summary....332
13. Floating-Point Numbers....335
IEEE 754, What Is It Good For?....336
The IEEE 754 Double-Precision Format....337
Rounding Behavior....339
Rounding Modes....339
Rounding Constants....340
Rounding Type Conversions....340
Rounding Arithmetic Operations....341
Linking Shared Libraries....341
The Lexer....342
Recognizing Floating-Point Constant Tokens....342
Matching the End of a Constant....343
The Parser....345
The Type Checker....348
TACKY Generation....349
Floating-Point Operations in Assembly....350
Working with SSE Instructions....351
Using Floating-Point Values in the System V Calling Convention....352
Doing Arithmetic with SSE Instructions....355
Comparing Floating-Point Numbers....357
Converting Between Floating-Point and Integer Types....357
Assembly Generation....364
Floating-Point Constants....366
Unary Instructions, Binary Instructions, and Conditional Jumps....367
Type Conversions....368
Function Calls....369
Return Instructions....373
The Complete Conversion from TACKY to Assembly....373
Pseudoregister Replacement....376
Instruction Fix-Up....376
Code Emission....378
Formatting Floating-Point Numbers....378
Labeling Floating-Point Constants....379
Storing Constants in the Read-Only Data Section....379
Initializing Static Variables to 0.0 or –0.0....380
Putting It All Together....380
Extra Credit: NaN....382
Summary....383
Additional Resources....383
14. Pointers....387
Objects and Values....388
Operations on Pointers....389
Address and Dereference Operations....389
Null Pointers and Type Conversions....391
Pointer Comparisons....392
& Operations on Dereferenced Pointers....393
The Lexer....393
The Parser....394
Parsing Declarations....396
Parsing Type Names....401
Putting It All Together....403
Semantic Analysis....404
Type Checking Pointer Expressions....405
Tracking Static Pointer Initializers in the Symbol Table....409
TACKY Generation....410
Pointer Operations in TACKY....411
A Strategy for TACKY Conversion....412
Assembly Generation....415
Replacing Pseudoregisters with Memory Operands....418
Fixing Up the lea and push Instructions....418
Code Emission....419
Summary....420
15. Arrays and Pointer Arithmetic....423
Arrays and Pointer Arithmetic....424
Array Declarations and Initializers....424
Memory Layout of Arrays....425
Array-to-Pointer Decay....426
Pointer Arithmetic to Access Array Elements....427
Even More Pointer Arithmetic....429
Array Types in Function Declarations....430
Things We Aren’t Implementing....431
The Lexer....432
The Parser....432
Parsing Array Declarators....434
Parsing Abstract Array Declarators....435
Parsing Compound Initializers....436
Parsing Subscript Expressions....436
The Type Checker....438
Converting Arrays to Pointers....438
Validating Lvalues....439
Type Checking Pointer Arithmetic....440
Type Checking Subscript Expressions....441
Type Checking Cast Expressions....442
Type Checking Function Declarations....442
Type Checking Compound Initializers....443
Initializing Static Arrays....444
Initializing Scalar Variables with ZeroInit....445
TACKY Generation....446
Pointer Arithmetic....447
Subscripting....448
Compound Initializers....450
Tentative Array Definitions....451
Assembly Generation....451
Converting TACKY to Assembly....454
Replacing PseudoMem Operands....457
Fixing Up Instructions....458
Code Emission....458
Summary....459
16. Characters and Strings....463
Character Traits....464
String Literals....465
Working with Strings in Assembly....466
The Lexer....469
The Parser....471
Parsing Type Specifiers....473
Parsing Character Constants....473
Parsing String Literals....473
Putting It All Together....473
The Type Checker....475
Characters....475
String Literals in Expressions....476
String Literals Initializing Non-static Variables....477
String Literals Initializing Static Variables....477
TACKY Generation....480
String Literals as Array Initializers....480
String Literals in Expressions....481
Top-Level Constants in TACKY....482
Assembly Generation....483
Operations on Characters....483
Top-Level Constants....486
The Complete Conversion from TACKY to Assembly....486
Pseudo-Operand Replacement....488
Instruction Fix-Up....489
Code Emission....489
Hello Again, World!....491
Summary....494
17. Supporting Dynamic Memory Allocation....497
The void Type....498
Memory Management with void *....500
Complete and Incomplete Types....501
The sizeof Operator....502
The Lexer....503
The Parser....503
The Type Checker....507
Conversions to and from void *....507
Functions with void Return Types....509
Scalar and Non-scalar Types....510
Restrictions on Incomplete Types....511
Extra Restrictions on void....513
Conditional Expressions with void Operands....516
Existing Validation for Arithmetic Expressions and Comparisons....516
sizeof Expressions....517
TACKY Generation....518
Functions with void Return Types....519
Casts to void....519
Conditional Expressions with void Operands....519
sizeof Expressions....520
The Latest and Greatest TACKY IR....521
Assembly Generation....522
Summary....523
18. Structures....525
Declaring Structure Types....526
Structure Member Declarations....528
Tag and Member Namespaces....529
Structure Type Declarations We Aren’t Implementing....530
Operating on Structures....531
Structure Layout in Memory....532
The Lexer....534
The Parser....534
Semantic Analysis....538
Resolving Structure Tags....538
Type Checking Structures....540
TACKY Generation....552
Implementing the Member Access Operators....553
Converting Compound Initializers to TACKY....557
Structures in the System V Calling Convention....559
Classifying Structures....559
Passing Parameters of Structure Type....562
Returning Structures....565
Assembly Generation....568
Extending the Assembly AST....569
Copying Structures....571
Using Structures in Function Calls....572
Putting It All Together....586
Replacing Pseudo-operands....590
Code Emission....591
Extra Credit: Unions....592
Summary....593
Additional Resources....593
Part III: Optimizations....595
19. Optimizing Tacky Programs....597
Safety and Observable Behavior....598
Four TACKY Optimizations....600
Constant Folding....601
Unreachable Code Elimination....601
Copy Propagation....603
Dead Store Elimination....604
With Our Powers Combined....606
Testing the Optimization Passes....609
Wiring Up the Optimization Stage....610
Constant Folding....613
Constant Folding for Part I TACKY Programs....613
Supporting Part II TACKY Programs....614
Control-Flow Graphs....616
Defining the Control-Flow Graph....617
Creating Basic Blocks....618
Adding Edges to the Control-Flow Graph....619
Converting a Control-Flow Graph to a List of Instructions....620
Making Your Control-Flow Graph Code Reusable....620
Unreachable Code Elimination....621
Eliminating Unreachable Blocks....621
Removing Useless Jumps....622
Removing Useless Labels....623
Removing Empty Blocks....623
A Little Bit About Data-Flow Analysis....624
Copy Propagation....625
Reaching Copies Analysis....629
Rewriting TACKY Instructions....638
Supporting Part II TACKY Programs....639
Dead Store Elimination....643
Liveness Analysis....644
Removing Dead Stores....648
Supporting Part II TACKY Programs....648
Summary....650
Additional Resources....650
20. Register Allocation....653
Register Allocation in Action....654
Take One: Put Everything on the Stack....655
Take Two: Register Allocation....656
Take Three: Register Allocation with Coalescing....658
Updating the Compiler Pipeline....659
Extending the Assembly AST....660
Converting TACKY to Assembly....661
Register Allocation by Graph Coloring....662
Detecting Interference....666
Spilling Registers....667
The Basic Register Allocator....670
Handling Multiple Types During Register Allocation....671
Defining the Interference Graph....671
Building the Interference Graph....671
Calculating Spill Costs....678
Coloring the Interference Graph....678
Building the Register Map and Rewriting the Function Body....686
Instruction Fix-Up with Callee-Saved Registers....688
Code Emission....689
Register Coalescing....691
Updating the Interference Graph....693
Conservative Coalescing....696
Implementing Register Coalescing....703
Summary....708
Additional Resources....709
Next Steps....711
Add Some Missing Features....711
Handle Undefined Behavior Safely....712
Write More TACKY Optimizations....712
Support Another Target Architecture....712
Contribute to an Open Source Programming Language Project....713
That’s a Wrap!....713
A. Debugging Assembly Code With GDB or LLDB....715
The Program....716
Debugging with GDB....717
Configuring the GDB UI....718
Starting and Stopping the Program....719
Printing Expressions....722
Examining Memory....724
Setting Conditional Breakpoints....725
Getting Help....726
Debugging with LLDB....727
Starting and Stopping the Program....728
Displaying Assembly Code....731
Printing Expressions....732
Examining Memory....734
Setting Conditional Breakpoints....735
Getting Help....737
B. Assembly Generation and Code Emission Tables....739
Part I....739
Converting TACKY to Assembly....740
Code Emission....741
Part II....744
Converting TACKY to Assembly....744
Code Emission....751
Part III....755
References....765
Index....771
Compilers are at the heart of everything programmers do, yet even experienced developers find them intimidating. For those eager to truly grasp how compilers work, Writing a C Compiler dispels the mystery. This book guides you through a fun and engaging project where you’ll learn what it takes to compile a real-world programming language to actual assembly code.
Writing a C Compiler will take you step by step through the process of building your own compiler for a significant subset of C—no prior experience with compiler construction or assembly code needed. Once you’ve built a working compiler for the simplest C program, you’ll add new features chapter by chapter. The algorithms in the book are all in pseudocode, so you can implement your compiler in whatever language you like. Along the way, you’ll explore key concepts like:
Compilers aren’t terrifying beasts—and with help from this hands-on, accessible guide, you might even turn them into your friends for life.