Machine Learning System Design....1
brief contents....6
contents....8
preface....15
acknowledgments....16
about this book....18
Who should read this book?....18
How this book is organized: A roadmap....19
liveBook discussion forum....20
about the authors....21
about the cover illustration....22
Part 1 Preparations....24
1 Essentials of machine learning system design....26
1.1 ML system design: What are you?....27
1.1.1 Why ML system design is so important....31
1.1.2 Roots of ML system design....31
1.2 How this book is structured....33
1.3 When principles of ML system design can be helpful....35
Summary....39
2 Is there a problem?....40
2.1 Problem space vs. solution space....41
2.2 Finding the problem....44
2.2.1 How we can approximate a solution through an ML system....47
2.3 Risks, limitations, and possible consequences....49
2.4 Costs of a mistake....51
Summary....53
3 Preliminary research....54
3.1 What problems can inspire you?....55
3.2 Build or buy: Open source-based or proprietary tech....57
3.2.1 Build or buy....57
3.2.2 Open source-based or proprietary tech....59
3.3 Problem decompositioning....59
3.4 Choosing the right degree of innovation....64
3.4.1 What solutions can be useful?....65
3.4.2 Working on the solution space: Practical example....67
Summary....69
4 Design document....71
4.1 Common myths surrounding the design document....72
4.1.1 Myth #1. Design documents work only for big companies but not startups....72
4.1.2 Myth #2. Design documents are efficient only for complex projects....73
4.1.3 Myth #3. Every design document should be based on a template....73
4.1.4 Myth #4. Every design document should lead to a deployed system....73
4.2 Goals and antigoals....74
4.3 Design document structure....77
4.4 Reviewing a design document....80
4.4.1 Design document review example....82
4.5 A design doc is a living thing....83
Summary....85
Part 2 Early stage....86
5 Loss functions and metrics....88
5.1 Losses....89
5.1.1 Loss tricks for deep learning models....92
5.2 Metrics....93
5.2.1 Consistency metrics....102
5.2.2 Offline and online metrics, proxy metrics, and hierarchy of metrics....104
5.3 Design document: Adding losses and metrics....107
5.3.1 Metrics and loss functions for Supermegaretail....107
5.3.2 Metrics and loss functions for PhotoStock Inc.....111
5.3.3 Wrap up....113
Summary....113
6 Gathering datasets....114
6.1 Data sources....115
6.2 Cooking the dataset....117
6.2.1 ETL....117
6.2.2 Filtering....118
6.2.3 Feature engineering....119
6.2.4 Labeling....119
6.3 Data and metadata....123
6.4 How much data is enough?....124
6.5 Chicken-or-egg problem....127
6.6 Properties of a healthy data pipeline....129
6.7 Design document: Dataset....131
6.7.1 Dataset for Supermegaretail....131
6.7.2 Dataset for PhotoStock Inc.....134
Summary....136
7 Validation schemas....137
7.1 Reliable evaluation....138
7.2 Standard schemas....139
7.2.1 Holdout sets....139
7.2.2 Cross-validation....140
7.2.3 The choice of K....141
7.2.4 Time-series validation....142
7.3 Nontrivial schemas....144
7.3.1 Nested validation....145
7.3.2 Adversarial validation....146
7.3.3 Quantifying dataset leakage exploitation....146
7.4 Split updating procedure....147
7.5 Design document: Choosing validation schemas....154
7.5.1 Validation schemas for Supermegaretail....154
7.5.2 Validation schemas for PhotoStock Inc.....157
Summary....158
8 Baseline solution....159
8.1 Baseline: What are you?....160
8.2 Constant baselines....162
8.2.1 Why do we need constant baselines?....164
8.3 Model baselines and feature baselines....165
8.4 Variety of deep learning baselines....167
8.5 Baseline comparison....169
8.6 Design document: Baselines....171
8.6.1 Baselines for Supermegaretail....171
8.6.2 Baselines for PhotoStock Inc.....173
Summary....174
Part 3 Intermediate steps....176
9 Error analysis....178
9.1 Learning curve analysis....179
9.1.1 Overfitting and underfitting....180
9.1.2 Loss curve....181
9.1.3 Interpreting loss curves....182
9.1.4 Model-wise learning curve....185
9.1.5 Sample-wise learning curve....186
9.1.6 Double descent....186
9.2 Residual analysis....187
9.2.1 Goals of residual analysis....189
9.2.2 Model assumptions....190
9.2.3 Residual distribution....193
9.2.4 Fairness of residuals....195
9.2.5 Underprediction and overprediction....196
9.2.6 Elasticity curves....197
9.3 Finding commonalities in residuals....198
9.3.1 Worst/best-case analysis....199
9.3.2 Adversarial validation....200
9.3.3 Variety of group analysis....200
9.3.4 Corner-case analysis....201
9.4 Design document: Error analysis....202
9.4.1 Error analysis for Supermegaretail....202
9.4.2 Error analysis for PhotoStock Inc.....206
Summary....207
10 Training pipelines....208
10.1 Training pipeline: What are you?....208
10.1.1 Training pipeline vs. inference pipeline....209
10.2 Tools and platforms....213
10.3 Scalability....214
10.4 Configurability....216
10.5 Testing....219
10.5.1 Property-based testing....220
10.6 Design document: Training pipelines....221
10.6.1 Training pipeline for Supermegaretail....222
10.6.2 Training pipeline for PhotoStock Inc.....223
Summary....225
11 Features and feature engineering....226
11.1 Feature engineering: What are you?....227
11.1.1 Criteria of good and bad features....228
11.1.2 Feature generation 101....229
11.1.3 Model predictions as a feature....231
11.2 Feature importance analysis....232
11.2.1 Classification of methods....234
11.2.2 Accuracy–interpretability tradeoff....236
11.2.3 Feature importance in deep learning....236
11.3 Feature selection....239
11.3.1 Feature generation vs. feature selection....239
11.3.2 Goals and possible drawbacks....239
11.3.3 Feature selection method overview....241
11.4 Feature store....244
11.4.1 Feature store: Pros and cons....246
11.4.2 Desired properties of a feature store....248
11.4.3 Feature catalog....252
11.5 Design document: Feature engineering....252
11.5.1 Features for Supermegaretail....252
11.5.2 Features for PhotoStock Inc.....254
Summary....256
12 Measuring and reporting results....257
12.1 Measuring results....258
12.1.1 Model performance....258
12.1.2 Transition to business metrics....259
12.1.3 Simulated environment....260
12.1.4 Human evaluation....264
12.2 A/B testing....264
12.2.1 Experiment design....265
12.2.2 Splitting strategy....267
12.2.3 Selecting metrics....268
12.2.4 Statistical criteria....269
12.2.5 Simulated experiments....270
12.2.6 When A/B testing is not possible....271
12.3 Reporting results....271
12.3.1 Control and auxiliary metrics....272
12.3.2 Uplift monitoring....272
12.3.3 When to finish the experiment....273
12.3.4 What to report....274
12.3.5 Debrief document....274
12.4 Design document: Measuring and reporting....275
12.4.1 Measuring and reporting for Supermegaretail....275
12.4.2 Measuring and reporting for PhotoStock Inc.....277
Summary....283
Part 4 Integration and growth....284
13 Integration....286
13.1 API design....287
13.1.1 API practices....291
13.2 Release cycle....292
13.3 Operating the system....296
13.3.1 Tech-related connections....296
13.3.2 Non-tech-related connections....297
13.4 Overrides and fallbacks....297
13.5 Design document: Integration....299
13.5.1 Integration for Supermegaretail....299
13.5.2 Integration for PhotoStock Inc.....302
Summary....304
14 Monitoring and reliability....305
14.1 Why monitoring is important....306
14.1.1 Incoming data....307
14.1.2 Model....307
14.1.3 Model output....308
14.1.4 Postprocessing/decision-making....309
14.2 Software system health....310
14.3 Data quality and integrity....311
14.3.1 Processing problems....311
14.3.2 Data source corruption....312
14.3.3 Cascade/upstream models....313
14.3.4 Schema change....314
14.3.5 Training-serving skew....314
14.3.6 How to monitor and react....315
14.4 Model quality and relevance....318
14.4.1 Data drift....320
14.4.2 Concept drift....321
14.4.3 How to monitor....322
14.4.4 How to react....324
14.5 Design document: Monitoring....329
14.5.1 Monitoring for Supermegaretail....329
14.5.2 Monitoring for PhotoStock Inc.....331
Summary....332
15 Serving and inference optimization....334
15.1 Serving and inference: Challenges....335
15.2 Tradeoffs and patterns....337
15.2.1 Tradeoffs....337
15.2.2 Patterns....340
15.3 Tools and frameworks....341
15.3.1 Choosing a framework....342
15.3.2 Serverless inference....344
15.4 Optimizing inference pipelines....346
15.4.1 Starting with profiling....346
15.4.2 The best optimizing is minimum optimizing....348
15.5 Design document: Serving and inference....348
15.5.1 Serving and inference for Supermegaretail....349
15.5.2 Serving and inference for PhotoStock Inc.....350
Summary....352
16 Ownership and maintenance....353
16.1 Accountability....354
16.2 Bus factor....359
16.2.1 Why is being too efficient not beneficial?....359
16.2.2 Why is being too redundant not beneficial?....360
16.2.3 When and how to use the bus factor....360
16.3 Documentation....361
16.4 Complexity....363
16.5 Maintenance and ownership: Supermegaretail and PhotoStock Inc.....366
Summary....367
index....368
A....368
B....368
C....368
D....369
E....369
F....370
G....370
H....370
I....370
K....371
L....371
M....371
N....372
O....372
P....372
Q....372
R....373
S....373
T....373
U....374
V....374
W....374
Y....374
Machine Learning System Design - back....375
From information gathering to release and maintenance, Machine Learning System Design guides you step-by-step through every stage of the machine learning process. Inside, you’ll find a reliable framework for building, maintaining, and improving machine learning systems at any scale or complexity.
Authors Valeri Babushkin and Arseny Kravchenko have filled this unique handbook with campfire stories and personal tips from their own extensive careers. You’ll learn directly from their experience as you consider every facet of a machine learning system, from requirements gathering and data sourcing to deployment and management of the finished system.
Designing and delivering a machine learning system is an intricate multistep process that requires many skills and roles. Whether you’re an engineer adding machine learning to an existing application or designing a ML system from the ground up, you need to navigate massive datasets and streams, lock down testing and deployment requirements, and master the unique complexities of putting ML models into production. That’s where this book comes in.
Machine Learning System Design shows you how to design and deploy a machine learning project from start to finish. You’ll follow a step-by-step framework for designing, implementing, releasing, and maintaining ML systems. As you go, requirement checklists and real-world examples help you prepare to deliver and optimize your own ML systems. You’ll especially love the campfire stories and personal tips, and ML system design interview tips.
For readers who know the basics of software engineering and machine learning. Examples in Python.