Acing the System Design Interview

Name: Acing the System Design Interview
Author: Tan Zhiyong

Автор: Tan Zhiyong

Дата выхода: 2024

Издательство: Manning Publications Co.

Количество страниц: 473

Размер файла: 12,1 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы Дополнительные материалы

foreword xvii
preface xxi
acknowledgments xxiii
about this book xxv
about the author xxviii
about the cover illustration xxix
Part 1............................................................................1
1 A walkthrough of system design concepts 3
1.1 It is a discussion about tradeoffs 4
1.2 Should you read this book? 4
1.3 Overview of this book 5
1.4 Prelude—A brief discussion of scaling the various services of a
system 6
The beginning—A small initial deployment of our app 6
Scaling with GeoDNS 7

Adding a caching service 8
Content Distribution Network (CDN) 9

A brief discussion of horizontal scalability and cluster management, continuous integration (CI) and continuous deployment (CD) 10

Functional partitioning and centralization of cross-cutting concerns 13

Batch and streaming extract, transform, and load (ETL) 17

Other common services 18

Cloud vs. bare metal 19

Serverless—Function as a Service (FaaS) 22
Conclusion—Scaling backend services 23
2 A typical system design interview flow 24
2.1 Clarify requirements and discuss tradeoffs 26
2.2 Draft the API specification 28
Common API endpoints 28
2.3 Connections and processing between users and data 28
2.4 Design the data model 29
Example of the disadvantages of multiple services sharing databases 30

A possible technique to prevent concurrent user update conflicts 31
2.5 Logging, monitoring, and alerting 34
The importance of monitoring 34

Observability 34
Responding to alerts 36

Application-level logging tools 37
Streaming and batch audit of data quality 39

Anomaly detection to detect data anomalies 39

Silent errors and auditing 40

Further reading on observability 40
2.6 Search bar 40
Introduction 40

Search bar implementation with Elasticsearch 41

Elasticsearch index and ingestion 42

Using Elasticsearch in place of SQL 43

Implementing search in our services 44

Further reading on search 44
2.7 Other discussions 44
Maintaining and extending the application 44

Supporting other types of users 45

Alternative architectural decisions 45
Usability and feedback 45

Edge cases and new constraints 46

Cloud native concepts 47
2.8 Post-interview reflection and assessment 47
Write your reflection as soon as possible after the interview 47
Writing your assessment 49

Details you didn’t mention 49
Interview feedback 50
2.9 Interviewing the company 51
3 Non-functional requirements 54
3.1 Scalability 56
Stateless and stateful services 57

Basic load balancer concepts 57
3.2 Availability 59
3.3 Fault-tolerance 60
Replication and redundancy 60

Forward error correction (FEC) and error correction code (ECC) 61

Circuit breaker 61

Exponential backoff and retry 62

Caching responses of other services 62

Checkpointing 62
Dead letter queue 62

Logging and periodic auditing 63
Bulkhead 63

Fallback pattern 64
3.4 Performance/latency and throughput 65
3.5 Consistency 66
Full mesh 67

Coordination service 68

Distributed cache 69

Gossip protocol 70

Random Leader Selection 70
3.6 Accuracy 70
3.7 Complexity and maintainability 71
Continuous deployment (CD) 72
3.8 Cost 72
3.9 Security 73
3.10 Privacy 73
External vs. internal services 74
3.11 Cloud native 75
3.12 Further reading 75
4 Scaling databases 77
4.1 Brief prelude on storage services 77
4.2 When to use vs. avoid databases 79
4.3 Replication 79
Distributing replicas 80

Single-leader replication 80
Multi-leader replication 84

Leaderless replication 85
HDFS replication 85

Further reading 87
4.4 Scaling storage capacity with sharded databases 87
Sharded RDBMS 88
4.5 Aggregating events 88
Single-tier aggregation 89

Multi-tier aggregation 89
Partitioning 90

Handling a large key space 91
Replication and fault-tolerance 92
4.6 Batch and streaming ETL 93
A simple batch ETL pipeline 93

Messaging terminology 95
Kafka vs. RabbitMQ 96

Lambda architecture 98
4.7 Denormalization 98
4.8 Caching 99
Read strategies 100

Write strategies 101
4.9 Caching as a separate service 103
4.10 Examples of different kinds of data to cache and how to cache them 103
4.11 Cache invalidation 104
Browser cache invalidation 105

Cache invalidation in caching services 105
4.12 Cache warming 106
4.13 Further reading 107
Caching references 107
5 Distributed transactions 109
5.1 Event Driven Architecture (EDA) 110
5.2 Event sourcing 111
5.3 Change Data Capture (CDC) 112
5.4 Comparison of event sourcing and CDC 113
5.5 Transaction supervisor 114
5.6 Saga 115
Choreography 115

Orchestration 117

Comparison 119
5.7 Other transaction types 120
5.8 Further reading 120
6 Common services for functional partitioning 122
6.1 Common functionalities of various services 123
Security 123

Error-checking 124

Performance and availability 124

Logging and analytics 124
6.2 Service mesh / sidecar pattern 125
6.3 Metadata service 126
6.4 Service discovery 127
6.5 Functional partitioning and various frameworks 128
Basic system design of an app 128

Purposes of a web server app 129

Web and mobile frameworks 130
6.6 Library vs. service 134
Language specific vs. technology-agnostic 135

Predictability of latency 136

Predictability and reproducibility of behavior 136

Scaling considerations for libraries 136
Other considerations 137
6.7 Common API paradigms 137
The Open Systems Interconnection (OSI) model 137
REST 138

RPC (Remote Procedure Call) 140
GraphQL 141

WebSocket 142

Comparison 142
Part 2........................................................................ 145
7 Design Craigslist 147
7.1 User stories and requirements 148
7.2 API 149
7.3 SQL database schema 150
7.4 Initial high-level architecture 150
7.5 A monolith architecture 151
7.6 Using a SQL database and object store 153
7.7 Migrations are troublesome 153
7.8 Writing and reading posts 156
7.9 Functional partitioning 158
7.10 Caching 159
7.11 CDN 160
7.12 Scaling reads with a SQL cluster 160
7.13 Scaling write throughput 160
7.14 Email service 161
7.15 Search 162
7.16 Removing old posts 162
7.17 Monitoring and alerting 163
7.18 Summary of our architecture discussion so far 163
7.19 Other possible discussion topics 164
Reporting posts 164

Graceful degradation 164
Complexity 164

Item categories/tags 166

Analytics and recommendations 166

A/B testing 167

Subscriptions and saved searches 167

Allow duplicate requests to the search service 168

Avoid duplicate requests to the search service 168

Rate limiting 169

Large number of posts 169

Local regulations 169
8 Design a rate-limiting service 171
8.1 Alternatives to a rate-limiting service, and why they are infeasible 172
8.2 When not to do rate limiting 174
8.3 Functional requirements 174
8.4 Non-functional requirements 175
Scalability 175

Performance 175

Complexity 175
Security and privacy 176

Availability and faulttolerance 176

Accuracy 176

Consistency 176
8.5 Discuss user stories and required service components 177
8.6 High-level architecture 177
8.7 Stateful approach/sharding 180
8.8 Storing all counts in every host 182
High-level architecture 182

Synchronizing counts 185
8.9 Rate-limiting algorithms 187
Token bucket 188

Leaky bucket 189

Fixed window counter 190

Sliding window log 192

Sliding window counter 193
8.10 Employing a sidecar pattern 193
8.11 Logging, monitoring, and alerting 193
8.12 Providing functionality in a client library 194
8.13 Further reading 195
9 Design a notification/alerting service 196
9.1 Functional requirements 196
Not for uptime monitoring 197

Users and data 197
Recipient channels 198

Templates 198

Trigger conditions 199

Manage subscribers, sender groups, and recipient groups 199

User features 199

Analytics 200
9.2 Non-functional requirements 200
9.3 Initial high-level architecture 200
9.4 Object store: Configuring and sending notifications 205
9.5 Notification templates 207
Notification template service 207

Additional features 209
9.6 Scheduled notifications 210
9.7 Notification addressee groups 212
9.8 Unsubscribe requests 215
9.9 Handling failed deliveries 216
9.10 Client-side considerations regarding duplicate notifications 218
9.11 Priority 218
9.12 Search 219
9.13 Monitoring and alerting 219
9.14 Availability monitoring and alerting on the notification/alerting service 220
9.15 Other possible discussion topics 220
9.16 Final notes 221
10 Design a database batch auditing service 223
10.1 Why is auditing necessary? 224
10.2 Defining a validation with a conditional statement on a SQL query’s result 226
10.3 A simple SQL batch auditing service 229
An audit script 229

An audit service 230
10.4 Requirements 232
10.5 High-level architecture 233
Running a batch auditing job 234

Handling alerts 235
10.6 Constraints on database queries 237
Limit query execution time 238

Check the query strings before submission 238

Users should be trained early 239
10.7 Prevent too many simultaneous queries 239
10.8 Other users of database schema metadata 240
10.9 Auditing a data pipeline 241
10.10 Logging, monitoring, and alerting 242
10.11 Other possible types of audits 242
Cross data center consistency audits 242

Compare upstream
and downstream data 243
10.12 Other possible discussion topics 243
10.13 References 243
11 Autocomplete/typeahead 245
11.1 Possible uses of autocomplete 246
11.2 Search vs. autocomplete 246
11.3 Functional requirements 248
Scope of our autocomplete service 248

Some UX (user experience) details 248

Considering search history 249
Content moderation and fairness 250
11.4 Nonfunctional requirements 250
11.5 Planning the high-level architecture 251
11.6 Weighted trie approach and initial high-level
architecture 252
11.7 Detailed implementation 253
Each step should be an independent task 255

Fetch relevant logs from Elasticsearch to HDFS 255

Split the search strings into words, and other simple operations 255

Filter out inappropriate words 256

Fuzzy matching and spelling correction 258

Count the words 259

Filter for appropriate words 259

Managing new popular unknown words 259
Generate and deliver the weighted trie 259
11.8 Sampling approach 260
11.9 Handling storage requirements 261
11.10 Handling phrases instead of single words 263
Maximum length of autocomplete suggestions 263
Preventing inappropriate suggestions 263
11.11 Logging, monitoring, and alerting 264
11.12 Other considerations and further discussion 264
12 Design Flickr 266
12.1 User stories and functional requirements 267
12.2 Non-functional requirements 267
12.3 High-level architecture 269
12.4 SQL schema 270
12.5 Organizing directories and files on the CDN 271
12.6 Uploading a photo 272
Generate thumbnails on the client 272

Generate thumbnails on the backend 276

Implementing both server-side and clientside generation 281
12.7 Downloading images and data 282
Downloading pages of thumbnails 282
12.8 Monitoring and alerting 283
12.9 Some other services 283
Premium features 283

Payments and taxes service 283
Censorship/content moderation 283

Advertising 284
Personalization 284
12.10 Other possible discussions 284
13 Design a Content Distribution Network (CDN) 287
13.1 Advantages and disadvantages of a CDN 288
Advantages of using a CDN 288

Disadvantages of using a CDN 289

Example of an unexpected problem from using a CDN to serve images 290
13.2 Requirements 291
13.3 CDN authentication and authorization 291
Steps in CDN authentication and authorization 292
Key rotation 294
13.4 High-level architecture 294
13.5 Storage service 295
In-cluster 296

Out-cluster 296

Evaluation 296
13.6 Common operations 297
Reads–Downloads 297

Writes–Directory creation, file upload, and file deletion 301
13.7 Cache invalidation 306
13.8 Logging, monitoring, and alerting 306
13.9 Other possible discussions on downloading media files 306
14 Design a text messaging app 308
14.1 Requirements 309
14.2 Initial thoughts 310
14.3 Initial high-level design 310
14.4 Connection service 312
Making connections 312

Sender blocking 312
14.5 Sender service 316
Sending a message 316

Other discussions 319
14.6 Message service 320
14.7 Message sending service 321
Introduction 321

High-level architecture 322

Steps in sending a message 324

Some questions 325

Improving availability 325
14.8 Search 326
14.9 Logging, monitoring, and alerting 326
14.10 Other possible discussion points 327
15 Design Airbnb 329
15.1 Requirements 330
15.2 Design decisions 333
Replication 334

Data models for room availability 334
Handling overlapping bookings 335

Randomize search results 335

Lock rooms during booking flow 335
15.3 High-level architecture 335
15.4 Functional partitioning 337
15.5 Create or update a listing 337
15.6 Approval service 339
15.7 Booking service 345
15.8 Availability service 349
15.9 Logging, monitoring, and alerting 350
15.10 Other possible discussion points 351
Handling regulations 352
16 Design a news feed 354
16.1 Requirements 355
16.2 High-level architecture 356
16.3 Prepare feed in advance 360
16.4 Validation and content moderation 364
Changing posts on users’ devices 365

Tagging posts 365
Moderation service 367
16.5 Logging, monitoring, and alerting 368
Serving images as well as text 368

High-level architecture 369
16.6 Other possible discussion points 372
17 Design a dashboard of top 10 products on Amazon by sales
volume 374
17.1 Requirements 375
17.2 Initial thoughts 376
17.3 Initial high-level architecture 377
17.4 Aggregation service 378
Aggregating by product ID 379

Matching host IDs and product IDs 379

Storing timestamps 380

Aggregation process on a host 380
17.5 Batch pipeline 381
17.6 Streaming pipeline 383
Hash table and max-heap with a single host 383
Horizontal scaling to multiple hosts and multi-tier aggregation 385
17.7 Approximation 386
Count-min sketch 388
17.8 Dashboard with Lambda architecture 390
17.9 Kappa architecture approach 390
Lambda vs. Kappa architecture 391

Kappa architecture for our dashboard 392
17.10 Logging, monitoring, and alerting 393
17.11 Other possible discussion points 393
17.12 References 394
A Monoliths vs. microservices 395
A.1 Disadvantages of monoliths 395
A.2 Advantages of monoliths 396
A.3 Advantages of services 396
Agile and rapid development and scaling of product requirements and business functionalities 397

Modularity and replaceability 397

Failure isolation and fault-tolerance 397
Ownership and organizational structure 398
A.4 Disadvantages of services 398
Duplicate components 398

Development and maintenance costs of additional components 399

Distributed transactions 400

Referential integrity 400

Coordinating feature development and deployments that span multiple services 400

Interfaces 401
A.5 References 402
B OAuth 2.0 authorization and OpenID Connect
authentication 403
B.1 Authorization vs. authentication 403
B.2 Prelude: Simple login, cookie-based authentication 404
B.3 Single sign-on (SSO) 404
B.4 Disadvantages of simple login 404
Complexity and lack of maintainability 405

No partial authorization 405
B.5 OAuth 2.0 flow 406
OAuth 2.0 terminology 407

Initial client setup 407
Back channel and front channel 409
B.6 Other OAuth 2.0 flows 410
B.7 OpenID Connect authentication 411
C C4 Model 413
D Two-phase commit (2PC) 418
index 422

The system design interview is one of the hardest challenges you’ll face in the software engineering hiring process. This practical book gives you the insights, the skills, and the hands-on practice you need to ace the toughest system design interview questions and land the job and salary you want.

In Acing the System Design Interview you will master a structured and organized approach to present system design ideas like:

Scaling applications to support heavy traffic
Distributed transactions techniques to ensure data consistency
Services for functional partitioning such as API gateway and service mesh
Common API paradigms including REST, RPC, and GraphQL
Caching strategies, including their tradeoffs
Logging, monitoring, and alerting concepts that are critical in any system design
Communication skills that demonstrate your engineering maturity

Don’t be daunted by the complex, open-ended nature of system design interviews! In this in-depth guide, author Zhiyong Tan shares what he’s learned on both sides of the interview table. You’ll dive deep into the common technical topics that arise during interviews and learn how to apply them to mentally perfect different kinds of systems.

About the technology

The system design interview is daunting even for seasoned software engineers. Fortunately, with a little careful prep work you can turn those open-ended questions and whiteboard sessions into your competitive advantage! In this powerful book, Zhiyong Tan reveals practical interview techniques and insights about system design that have earned developers job offers from Amazon, Apple, ByteDance, PayPal, and Uber.

About the book

Acing the System Design Interview is a masterclass in how to confidently nail your next interview. Following these easy-to-remember techniques, you’ll learn to quickly assess a question, identify an advantageous approach, and then communicate your ideas clearly to an interviewer. As you work through this book, you’ll gain not only the skills to successfully interview, but also to do the actual work of great system design.

What's inside

Insights on scaling, transactions, logging, and more
Practice questions for core system design concepts
How to demonstrate your engineering maturity
Great questions to ask your interviewer

About the reader

For software engineers, software architects, and engineering managers looking to advance their careers.

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг