Writing a C compiler: Build a Real Programming Language from Scratch

Writing a C compiler: Build a Real Programming Language from Scratch

Writing a C compiler: Build a Real Programming Language from Scratch
Автор: Sandler Nora
Дата выхода: 2024
Издательство: No Starch Press, Inc.
Количество страниц: 904
Размер файла: 7.2 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Cover....1


 Title Page....5


 Copyright....6


 Dedication....7


 About the Author....9


 About the Technical Reviewer....9


 Brief Contents....11


 Contents in Detail....13


 Acknowledgments ....25


 Introduction....27


 Who This Book Is For....28


 Why Write a C Compiler?....29


 Compilation from 10,000 Feet....29


 What You’ll Build....30


 How to Use This Book....34


 The Test Suite....34


 Extra Credit Features....34


 Some Advice on Choosing an Implementation Language....35


 System Requirements....36


 Installing GCC and GDB on Linux....36


 Installing the Command Line Developer Tools on macOS....37


 Running on Apple Silicon....37


 Validating Your Setup....37


 Additional Resources....38


 Let’s Go!....39


 Part I: The Basics....41


 1. A Minimal Compiler....43


 The Four Compiler Passes....43


 Hello, Assembly!....44


 The Compiler Driver....47


 The Lexer....48


 The Parser....50


 An Example Abstract Syntax Tree....51


 The AST Definition....53


 The Formal Grammar....54


 Recursive Descent Parsing....55


 Assembly Generation....57


 Code Emission....59


 Summary....61


 Additional Resources....61


 2. Unary Operators....65


 Negation and Bitwise Complement in Assembly....66


 The Stack....67


 The Lexer....71


 The Parser....73


 TACKY: A New Intermediate Representation....75


 Defining TACKY....76


 Generating TACKY....77


 Generating Names for Temporary Variables....78


 Updating the Compiler Driver....78


 Assembly Generation....79


 Converting TACKY to Assembly....79


 Replacing Pseudoregisters....82


 Fixing Up Instructions....82


 Code Emission....83


 Summary....85


 Additional Resources....85


 3. Binary Operators....87


 The Lexer....88


 The Parser....88


 The Trouble with Recursive Descent Parsing....90


 The Adequate Solution: Refactoring the Grammar....91


 The Better Solution: Precedence Climbing....91


 Precedence Climbing in Action....95


 TACKY Generation....98


 Assembly Generation....100


 Doing Arithmetic in Assembly....100


 Converting Binary Operations to Assembly....101


 Replacing Pseudoregisters....104


 Fixing Up the idiv, add, sub, and imul Instructions....104


 Code Emission....105


 Extra Credit: Bitwise Operators....107


 Summary....107


 Additional Resources....108


 4. Logical and Relational Operators....111


 Short-Circuiting Operators....112


 The Lexer....112


 The Parser....113


 TACKY Generation....115


 Adding Jumps, Copies, and Comparisons to the TACKY IR....115


 Converting Short-Circuiting Operators to TACKY....116


 Generating Labels....117


 Comparisons and Jumps in Assembly....118


 Comparisons and Status Flags....118


 Conditional Set Instructions....122


 Jump Instructions....123


 Assembly Generation....125


 Replacing Pseudoregisters....127


 Fixing Up the cmp Instruction....128


 Code Emission....128


 Summary....130


 Additional Resources....131


 5. Local Variables....133


 Variables, Declarations, and Assignment....134


 The Lexer....137


 The Parser....137


 The Updated AST and Grammar....137


 An Improved Precedence Climbing Algorithm....141


 Semantic Analysis....143


 Variable Resolution....145


 The --validate Option....149


 TACKY Generation....149


 Variable and Assignment Expressions....150


 Declarations, Statements, and Function Bodies....150


 Functions with No return Statement....151


 Extra Credit: Compound Assignment, Increment, and Decrement....153


 Summary....154


 6. If Statements and Conditional Expressions....157


 The Lexer....158


 The Parser....158


 Parsing if Statements....158


 Parsing Conditional Expressions....161


 Variable Resolution....165


 TACKY Generation....166


 Converting if Statements to TACKY....166


 Converting Conditional Expressions to TACKY....167


 Extra Credit: Labeled Statements and goto....168


 Summary....168


 7. Compound Statements....171


 The Scoop on Scopes....172


 The Parser....175


 Variable Resolution....176


 Resolving Variables in Multiple Scopes....177


 Updating the Variable Resolution Pseudocode....178


 TACKY Generation....180


 Summary....180


 8. Loops....183


 Loops and How to Escape Them....184


 The Lexer....188


 The Parser....189


 Semantic Analysis....191


 Extending Variable Resolution....191


 Loop Labeling....192


 Implementing Loop Labeling....194


 TACKY Generation....195


 break and continue Statements....195


 do Loops....196


 while Loops....197


 for Loops....197


 Extra Credit: switch Statements....199


 Summary....199


 9. Functions....201


 Declaring, Defining, and Calling Functions....202


 Declarations and Definitions....202


 Function Calls....205


 Identifier Linkage....206


 Compiling Libraries....209


 The Lexer....210


 The Parser....210


 Semantic Analysis....214


 Extending Identifier Resolution....215


 Writing the Type Checker....218


 TACKY Generation....222


 Assembly Generation....223


 Understanding Calling Conventions....224


 Calling Functions with the System V ABI....225


 Converting Function Calls and Definitions to Assembly....234


 Replacing Pseudoregisters....240


 Allocating Stack Space During Instruction Fix-Up....240


 Code Emission....241


 Calling Library Functions....243


 Summary....245


 10. File Scope Variable Declarations and Storage-Class Specifiers....247


 All About Declarations....248


 Scope....248


 Linkage....249


 Storage Duration....252


 Definitions vs. Declarations....254


 Error Cases....257


 Linkage and Storage Duration in Assembly....260


 The Lexer....263


 The Parser....264


 Parsing Type and Storage-Class Specifiers....265


 Distinguishing Between Function and Variable Declarations....266


 Semantic Analysis....267


 Identifier Resolution: Resolving External Variables....267


 Type Checking: Tracking Static Functions and Variables....269


 TACKY Generation....274


 Assembly Generation....275


 Generating Assembly for Variable Definitions....276


 Replacing Pseudoregisters According to Their Storage Duration....277


 Fixing Up Instructions....277


 Code Emission....278


 Summary....280


 Part II: Types Beyond Int....281


 11. Long Integers....283


 Long Integers in Assembly....284


 Type Conversions....284


 Static Long Variables....286


 The Lexer....287


 The Parser....287


 Semantic Analysis....291


 Adding Type Information to the AST....292


 Type Checking Expressions....293


 Type Checking return Statements....296


 Type Checking Declarations and Updating the Symbol Table....297


 TACKY Generation....298


 Tracking the Types of Temporary Variables....300


 Generating Extra Return Instructions....301


 Assembly Generation....301


 Tracking Assembly Types in the Backend Symbol Table....306


 Replacing Longword and Quadword Pseudoregisters....307


 Fixing Up Instructions....307


 Code Emission....309


 Summary....311


 12. Unsigned Integers....313


 Type Conversions, Again....314


 Converting Between Signed and Unsigned Types of the Same Size....314


 Converting unsigned int to a Larger Type....314


 Converting signed int to a Larger Type....315


 Converting from Larger to Smaller Types....315


 The Lexer....315


 The Parser....316


 The Type Checker....319


 TACKY Generation....321


 Unsigned Integer Operations in Assembly....323


 Unsigned Comparisons....323


 Unsigned Division....326


 Zero Extension....326


 Assembly Generation....327


 Replacing Pseudoregisters....329


 Fixing Up the Div and MovZeroExtend Instructions....330


 Code Emission....330


 Summary....332


 13. Floating-Point Numbers....335


 IEEE 754, What Is It Good For?....336


 The IEEE 754 Double-Precision Format....337


 Rounding Behavior....339


 Rounding Modes....339


 Rounding Constants....340


 Rounding Type Conversions....340


 Rounding Arithmetic Operations....341


 Linking Shared Libraries....341


 The Lexer....342


 Recognizing Floating-Point Constant Tokens....342


 Matching the End of a Constant....343


 The Parser....345


 The Type Checker....348


 TACKY Generation....349


 Floating-Point Operations in Assembly....350


 Working with SSE Instructions....351


 Using Floating-Point Values in the System V Calling Convention....352


 Doing Arithmetic with SSE Instructions....355


 Comparing Floating-Point Numbers....357


 Converting Between Floating-Point and Integer Types....357


 Assembly Generation....364


 Floating-Point Constants....366


 Unary Instructions, Binary Instructions, and Conditional Jumps....367


 Type Conversions....368


 Function Calls....369


 Return Instructions....373


 The Complete Conversion from TACKY to Assembly....373


 Pseudoregister Replacement....376


 Instruction Fix-Up....376


 Code Emission....378


 Formatting Floating-Point Numbers....378


 Labeling Floating-Point Constants....379


 Storing Constants in the Read-Only Data Section....379


 Initializing Static Variables to 0.0 or –0.0....380


 Putting It All Together....380


 Extra Credit: NaN....382


 Summary....383


 Additional Resources....383


 14. Pointers....387


 Objects and Values....388


 Operations on Pointers....389


 Address and Dereference Operations....389


 Null Pointers and Type Conversions....391


 Pointer Comparisons....392


 & Operations on Dereferenced Pointers....393


 The Lexer....393


 The Parser....394


 Parsing Declarations....396


 Parsing Type Names....401


 Putting It All Together....403


 Semantic Analysis....404


 Type Checking Pointer Expressions....405


 Tracking Static Pointer Initializers in the Symbol Table....409


 TACKY Generation....410


 Pointer Operations in TACKY....411


 A Strategy for TACKY Conversion....412


 Assembly Generation....415


 Replacing Pseudoregisters with Memory Operands....418


 Fixing Up the lea and push Instructions....418


 Code Emission....419


 Summary....420


 15. Arrays and Pointer Arithmetic....423


 Arrays and Pointer Arithmetic....424


 Array Declarations and Initializers....424


 Memory Layout of Arrays....425


 Array-to-Pointer Decay....426


 Pointer Arithmetic to Access Array Elements....427


 Even More Pointer Arithmetic....429


 Array Types in Function Declarations....430


 Things We Aren’t Implementing....431


 The Lexer....432


 The Parser....432


 Parsing Array Declarators....434


 Parsing Abstract Array Declarators....435


 Parsing Compound Initializers....436


 Parsing Subscript Expressions....436


 The Type Checker....438


 Converting Arrays to Pointers....438


 Validating Lvalues....439


 Type Checking Pointer Arithmetic....440


 Type Checking Subscript Expressions....441


 Type Checking Cast Expressions....442


 Type Checking Function Declarations....442


 Type Checking Compound Initializers....443


 Initializing Static Arrays....444


 Initializing Scalar Variables with ZeroInit....445


 TACKY Generation....446


 Pointer Arithmetic....447


 Subscripting....448


 Compound Initializers....450


 Tentative Array Definitions....451


 Assembly Generation....451


 Converting TACKY to Assembly....454


 Replacing PseudoMem Operands....457


 Fixing Up Instructions....458


 Code Emission....458


 Summary....459


 16. Characters and Strings....463


 Character Traits....464


 String Literals....465


 Working with Strings in Assembly....466


 The Lexer....469


 The Parser....471


 Parsing Type Specifiers....473


 Parsing Character Constants....473


 Parsing String Literals....473


 Putting It All Together....473


 The Type Checker....475


 Characters....475


 String Literals in Expressions....476


 String Literals Initializing Non-static Variables....477


 String Literals Initializing Static Variables....477


 TACKY Generation....480


 String Literals as Array Initializers....480


 String Literals in Expressions....481


 Top-Level Constants in TACKY....482


 Assembly Generation....483


 Operations on Characters....483


 Top-Level Constants....486


 The Complete Conversion from TACKY to Assembly....486


 Pseudo-Operand Replacement....488


 Instruction Fix-Up....489


 Code Emission....489


 Hello Again, World!....491


 Summary....494


 17. Supporting Dynamic Memory Allocation....497


 The void Type....498


 Memory Management with void *....500


 Complete and Incomplete Types....501


 The sizeof Operator....502


 The Lexer....503


 The Parser....503


 The Type Checker....507


 Conversions to and from void *....507


 Functions with void Return Types....509


 Scalar and Non-scalar Types....510


 Restrictions on Incomplete Types....511


 Extra Restrictions on void....513


 Conditional Expressions with void Operands....516


 Existing Validation for Arithmetic Expressions and Comparisons....516


 sizeof Expressions....517


 TACKY Generation....518


 Functions with void Return Types....519


 Casts to void....519


 Conditional Expressions with void Operands....519


 sizeof Expressions....520


 The Latest and Greatest TACKY IR....521


 Assembly Generation....522


 Summary....523


 18. Structures....525


 Declaring Structure Types....526


 Structure Member Declarations....528


 Tag and Member Namespaces....529


 Structure Type Declarations We Aren’t Implementing....530


 Operating on Structures....531


 Structure Layout in Memory....532


 The Lexer....534


 The Parser....534


 Semantic Analysis....538


 Resolving Structure Tags....538


 Type Checking Structures....540


 TACKY Generation....552


 Implementing the Member Access Operators....553


 Converting Compound Initializers to TACKY....557


 Structures in the System V Calling Convention....559


 Classifying Structures....559


 Passing Parameters of Structure Type....562


 Returning Structures....565


 Assembly Generation....568


 Extending the Assembly AST....569


 Copying Structures....571


 Using Structures in Function Calls....572


 Putting It All Together....586


 Replacing Pseudo-operands....590


 Code Emission....591


 Extra Credit: Unions....592


 Summary....593


 Additional Resources....593


 Part III: Optimizations....595


 19. Optimizing Tacky Programs....597


 Safety and Observable Behavior....598


 Four TACKY Optimizations....600


 Constant Folding....601


 Unreachable Code Elimination....601


 Copy Propagation....603


 Dead Store Elimination....604


 With Our Powers Combined....606


 Testing the Optimization Passes....609


 Wiring Up the Optimization Stage....610


 Constant Folding....613


 Constant Folding for Part I TACKY Programs....613


 Supporting Part II TACKY Programs....614


 Control-Flow Graphs....616


 Defining the Control-Flow Graph....617


 Creating Basic Blocks....618


 Adding Edges to the Control-Flow Graph....619


 Converting a Control-Flow Graph to a List of Instructions....620


 Making Your Control-Flow Graph Code Reusable....620


 Unreachable Code Elimination....621


 Eliminating Unreachable Blocks....621


 Removing Useless Jumps....622


 Removing Useless Labels....623


 Removing Empty Blocks....623


 A Little Bit About Data-Flow Analysis....624


 Copy Propagation....625


 Reaching Copies Analysis....629


 Rewriting TACKY Instructions....638


 Supporting Part II TACKY Programs....639


 Dead Store Elimination....643


 Liveness Analysis....644


 Removing Dead Stores....648


 Supporting Part II TACKY Programs....648


 Summary....650


 Additional Resources....650


 20. Register Allocation....653


 Register Allocation in Action....654


 Take One: Put Everything on the Stack....655


 Take Two: Register Allocation....656


 Take Three: Register Allocation with Coalescing....658


 Updating the Compiler Pipeline....659


 Extending the Assembly AST....660


 Converting TACKY to Assembly....661


 Register Allocation by Graph Coloring....662


 Detecting Interference....666


 Spilling Registers....667


 The Basic Register Allocator....670


 Handling Multiple Types During Register Allocation....671


 Defining the Interference Graph....671


 Building the Interference Graph....671


 Calculating Spill Costs....678


 Coloring the Interference Graph....678


 Building the Register Map and Rewriting the Function Body....686


 Instruction Fix-Up with Callee-Saved Registers....688


 Code Emission....689


 Register Coalescing....691


 Updating the Interference Graph....693


 Conservative Coalescing....696


 Implementing Register Coalescing....703


 Summary....708


 Additional Resources....709


 Next Steps....711


 Add Some Missing Features....711


 Handle Undefined Behavior Safely....712


 Write More TACKY Optimizations....712


 Support Another Target Architecture....712


 Contribute to an Open Source Programming Language Project....713


 That’s a Wrap!....713


 A. Debugging Assembly Code With GDB or LLDB....715


 The Program....716


 Debugging with GDB....717


 Configuring the GDB UI....718


 Starting and Stopping the Program....719


 Printing Expressions....722


 Examining Memory....724


 Setting Conditional Breakpoints....725


 Getting Help....726


 Debugging with LLDB....727


 Starting and Stopping the Program....728


 Displaying Assembly Code....731


 Printing Expressions....732


 Examining Memory....734


 Setting Conditional Breakpoints....735


 Getting Help....737


 B. Assembly Generation and Code Emission Tables....739


 Part I....739


 Converting TACKY to Assembly....740


 Code Emission....741


 Part II....744


 Converting TACKY to Assembly....744


 Code Emission....751


 Part III....755


 References....765


 Index....771

Compilers are at the heart of everything programmers do, yet even experienced developers find them intimidating. For those eager to truly grasp how compilers work, Writing a C Compiler dispels the mystery. This book guides you through a fun and engaging project where you’ll learn what it takes to compile a real-world programming language to actual assembly code.

Writing a C Compiler will take you step by step through the process of building your own compiler for a significant subset of C—no prior experience with compiler construction or assembly code needed. Once you’ve built a working compiler for the simplest C program, you’ll add new features chapter by chapter. The algorithms in the book are all in pseudocode, so you can implement your compiler in whatever language you like. Along the way, you’ll explore key concepts like:

  • Lexing and parsing: Learn how to write a lexer and recursive descent parser that transform C code into an abstract syntax tree.
  • Program analysis: Discover how to analyze a program to understand its behavior and detect errors.
  • Code generation: Learn how to translate C language constructs like arithmetic operations, function calls, and control-flow statements into x64 assembly code.
  • Optimization techniques: Improve performance with methods like constant folding, dead store elimination, and register allocation.

Compilers aren’t terrifying beasts—and with help from this hands-on, accessible guide, you might even turn them into your friends for life.


Похожее:

Список отзывов:

Нет отзывов к книге.