Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. 2 Ed

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. 2 Ed

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. 2 Ed
Автор: Kleppmann Martin, Riccomini Chris
Дата выхода: 2026
Издательство: O’Reilly Media, Inc.
Количество страниц: 673
Размер файла: 4,8 МБ
Тип файла: PDF
Добавил: Aleks-5
 Проверить на вирусы

Cover....1

Copyright....6

Table of Contents....11

Preface....19

Who Should Read This Book?....20

Whats New in the Second Edition?....21

References and Further Reading....21

Conventions Used in This Book....22

OReilly Online Learning....22

How to Contact Us....23

Acknowledgments....23

Chapter 1. Trade-Offs in Data Systems Architecture....25

Operational Versus Analytical Systems....27

Characterizing Transaction Processing and Analytics....29

Data Warehousing....31

Systems of Record and Derived Data....34

Cloud Versus Self-Hosting....36

Pros and Cons of Cloud Services....37

Cloud Native System Architecture....38

Operations in the Cloud Era....41

Distributed Versus Single-Node Systems....43

Problems with Distributed Systems....44

Microservices and Serverless....45

Cloud Computing Versus Supercomputing....47

Data Systems, Law, and Society....48

Summary....49

Chapter 2. Defining Nonfunctional Requirements....57

Case Study: Social Network Home Timelines....58

Representing Users, Posts, and Follows....58

Materializing and Updating Timelines....59

Describing Performance....61

Latency and Response Time....62

Average, Median, and Percentiles....64

Use of Response Time Metrics....65

Reliability and Fault Tolerance....67

Fault Tolerance....67

Hardware and Software Faults....68

Humans and Reliability....71

Scalability....73

Understanding Load....74

Shared-Memory, Shared-Disk, and Shared-Nothing Architectures....75

Principles for Scalability....76

Maintainability....76

Operability: Making Life Easy for Operations....77

Simplicity: Managing Complexity....78

Evolvability: Making Change Easy....79

Summary....80

Chapter 3. Data Models and Query Languages....89

Relational Versus Document Models....91

The Object-Relational Mismatch....92

Normalization, Denormalization, and Joins....96

Many-to-One and Many-to-Many Relationships....99

Stars and Snowflakes: Schemas for Analytics....101

When to Use Which Model....104

Graph-Like Data Models....108

Property Graphs....110

The Cypher Query Language....112

Graph Queries in SQL....114

Triple Stores and SPARQL....116

Datalog: Recursive Relational Queries....120

GraphQL....122

Event Sourcing and CQRS....125

DataFrames, Matrices, and Arrays....129

Summary....131

Chapter 4. Storage and Retrieval....139

Storage and Indexing for OLTP....140

Log-Structured Storage....142

B-Trees....149

Comparing B-Trees and LSM-Trees....153

Multicolumn and Secondary Indexes....156

Storing Values Within the Index....157

Keeping Everything in Memory....157

Data Storage for Analytics....158

Cloud Data Warehouses....159

Column-Oriented Storage....160

Query Execution: Compilation and Vectorization....166

Materialized Views and Data Cubes....167

Multidimensional and Full-Text Indexes....169

Full-Text Search....170

Vector Embeddings....171

Summary....174

Chapter 5. Encoding and Evolution....185

Formats for Encoding Data....187

Language-Specific Formats....188

JSON, XML, and Binary Variants....189

Protocol Buffers....193

Avro....196

The Merits of Schemas....201

Modes of Dataflow....202

Dataflow Through Databases....202

Dataflow Through Services: REST and RPC....204

Durable Execution and Workflows....211

Event-Driven Architectures....213

Summary....215

Chapter 6. Replication....221

Single-Leader Replication....222

Synchronous Versus Asynchronous Replication....224

Setting Up New Followers....225

Handling Node Outages....228

Implementation of Replication Logs....230

Problems with Replication Lag....233

Solutions for Replication Lag....238

Multi-Leader Replication....239

Geographically Distributed Operation....240

Sync Engines and Local-First Software....244

Dealing with Conflicting Writes....246

Leaderless Replication....253

Writing to the Database When a Node Is Down....253

Single-Leader Versus Leaderless Replication Performance....259

Multi-Region Operation....260

Detecting Concurrent Writes....261

Summary....267

Chapter 7. Sharding....275

Pros and Cons of Sharding....277

Sharding for Multitenancy....278

Sharding of Key-Value Data....279

Sharding by Key Range....280

Sharding by Hash of Key....282

Skewed Workloads and Relieving Hot Spots....287

Operations: Automatic Versus Manual Rebalancing....288

Request Routing....289

Sharding and Secondary Indexes....292

Local Secondary Indexes....292

Global Secondary Indexes....294

Summary....295

Chapter 8. Transactions....301

What Exactly Is a Transaction?....302

The Meaning of ACID....303

Single-Object and Multi-Object Operations....308

Weak Isolation Levels....312

Read Committed....314

Snapshot Isolation and Repeatable Read....317

Preventing Lost Updates....323

Write Skew and Phantoms....327

Serializability....332

Actual Serial Execution....333

Two-Phase Locking....337

Serializable Snapshot Isolation....341

Distributed Transactions....347

Two-Phase Commit....348

Distributed Transactions Across Different Systems....352

Database-Internal Distributed Transactions....357

Exactly-Once Message Processing Revisited....358

Summary....359

Chapter 9. The Trouble with Distributed Systems....369

Faults and Partial Failures....370

Unreliable Networks....371

The Limitations of TCP....372

Network Faults in Practice....374

Fault Detection....375

Timeouts and Unbounded Delays....376

Synchronous Versus Asynchronous Networks....379

Unreliable Clocks....382

Monotonic Versus Time-of-Day Clocks....383

Clock Synchronization and Accuracy....384

Relying on Synchronized Clocks....386

Process Pauses....390

Knowledge, Truth, and Lies....395

The Majority Rules....396

Distributed Locks and Leases....397

Byzantine Faults....401

System Model and Reality....404

Formal Methods and Randomized Testing....408

Summary....412

Chapter 10. Consistency and Consensus....425

Linearizability....426

What Makes a System Linearizable?....428

Relying on Linearizability....432

Implementing Linearizable Systems....435

The Cost of Linearizability....437

ID Generators and Logical Clocks....441

Logical Clocks....444

Linearizable ID Generators....447

Consensus....449

The Many Faces of Consensus....451

Consensus in Practice....457

Coordination Services....461

Summary....464

Chapter 11. Batch Processing....475

Batch Processing with Unix Tools....478

Simple Log Analysis....478

Chain of Commands Versus Custom Program....480

Sorting Versus In-Memory Aggregation....480

Batch Processing in Distributed Systems....481

Distributed Filesystems....482

Object Stores....484

Distributed Job Orchestration....485

Batch Processing Models....490

MapReduce....490

Dataflow Engines....492

Shuffling Data....493

Joins and Grouping....495

Query Languages....497

DataFrames....499

Batch Use Cases....500

Extract–Transform–Load....500

Analytics....501

Machine Learning....502

Serving Derived Data....503

Summary....505

Chapter 12. Stream Processing....511

Transmitting Event Streams....512

Messaging Systems....513

Log-Based Message Brokers....519

Databases and Streams....524

Keeping Systems in Sync....525

Change Data Capture....527

State, Streams, and Immutability....532

Processing Streams....537

Uses of Stream Processing....538

Reasoning About Time....542

Stream Joins....547

Fault Tolerance....550

Summary....553

Chapter 13. A Philosophy of Streaming Systems....563

Data Integration....563

Combining Specialized Tools by Deriving Data....564

Batch and Stream Processing....568

Unbundling Databases....570

Composing Data Storage Technologies....571

Designing Applications Around Dataflow....575

Observing Derived State....579

Aiming for Correctness....585

The End-to-End Argument for Databases....586

Enforcing Constraints....590

Timeliness and Integrity....595

Trust, but Verify....599

Summary....603

Chapter 14. Doing the Right Thing....609

Predictive Analytics....610

Bias and Discrimination....610

Responsibility and Accountability....611

Feedback Loops....612

Privacy and Tracking....613

Surveillance....614

Consent and Freedom of Choice....615

Privacy and Use of Data....616

Data as Assets and Power....618

Remembering the Industrial Revolution....619

Legislation and Self-Regulation....620

Summary....621

Glossary....627

Index....633

About the Authors....672

Colophon....672

Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition, there's an overwhelming variety of systems, including relational databases, NoSQL datastores, data warehouses, and data lakes. There are cloud services, on-premises services, and embedded databases. What are the right choices for your application? How do you make sense of all these buzzwords?

In this second edition, authors Martin Kleppmann and Chris Riccomini build on the foundation laid in the acclaimed first edition, integrating new technologies and emerging trends. You'll be guided through the maze of decisions and trade-offs involved in building a modern data system, learn how to choose the right tools for your needs, and understand the fundamentals of distributed systems.

  • Peer under the hood of the systems you already use, and learn to use them more effectively
  • Make informed decisions by identifying the strengths and weaknesses of different tools
  • Learn how major cloud services are designed for scalability, fault tolerance, and consistency
  • Understand the core principles upon which modern databases are built

Похожее:

Список отзывов:

Нет отзывов к книге.