Foreword xv
Preface xix
1. Introduction to Machine Learning Production Systems 1
What Is Production Machine Learning? 1
Benefits of Machine Learning Pipelines 3
Focus on Developing New Models, Not on Maintaining Existing Models 3
Prevention of Bugs 3
Creation of Records for Debugging and Reproducing Results 4
Standardization 4
The Business Case for ML Pipelines 4
When to Use Machine Learning Pipelines 5
Steps in a Machine Learning Pipeline 5
Data Ingestion and Data Versioning 6
Data Validation 6
Feature Engineering 6
Model Training and Model Tuning 7
Model Analysis 7
Model Deployment 8
Looking Ahead 8
2. Collecting, Labeling, and Validating Data 9
Important Considerations in Data Collection 9
Responsible Data Collection 10
Labeling Data: Data Changes and Drift in Production ML 11
Labeling Data: Direct Labeling and Human Labeling 13
Validating Data: Detecting Data Issues 14
Validating Data: TensorFlow Data Validation 14
Skew Detection with TFDV 15
Types of Skew 16
Example: Spotting Imbalanced Datasets with TensorFlow Data Validation 17
Conclusion 19
3. Feature Engineering and Feature Selection 21
Introduction to Feature Engineering 21
Preprocessing Operations 23
Feature Engineering Techniques 24
Normalizing and Standardizing 24
Bucketizing 25
Feature Crosses 26
Dimensionality and Embeddings 26
Visualization 26
Feature Transformation at Scale 27
Choose a Framework That Scales Well 27
Avoid Training–Serving Skew 28
Consider Instance-Level Versus Full-Pass Transformations 28
Using TensorFlow Transform 29
Analyzers 31
Code Example 32
Feature Selection 32
Feature Spaces 33
Feature Selection Overview 33
Filter Methods 34
Wrapper Methods 35
Embedded Methods 37
Feature and Example Selection for LLMs and GenAI 38
Example: Using TF Transform to Tokenize Text 38
Benefits of Using TF Transform 41
Alternatives to TF Transform 42
Conclusion 42
4. Data Journey and Data Storage 43
Data Journey 43
ML Metadata 44
Using a Schema 45
Schema Development 46
Schema Environments 46
Changes Across Datasets 47
Enterprise Data Storage 48
Feature Stores 48
Data Warehouses 50
Data Lakes 51
Conclusion 51
5. Advanced Labeling, Augmentation, and Data Preprocessing 53
Advanced Labeling 54
Semi-Supervised Labeling 54
Active Learning 56
Weak Supervision 59
Advanced Labeling Review 60
Data Augmentation 61
Example: CIFAR-10 62
Other Augmentation Techniques 62
Data Augmentation Review 62
Preprocessing Time Series Data: An Example 63
Windowing 64
Sampling 65
Conclusion 66
6. Model Resource Management Techniques 67
Dimensionality Reduction: Dimensionality Effect on Performance 67
Example: Word Embedding Using Keras 68
Curse of Dimensionality 72
Adding Dimensions Increases Feature Space Volume 73
Dimensionality Reduction 74
Quantization and Pruning 78
Mobile, IoT, Edge, and Similar Use Cases 78
Quantization 78
Optimizing Your TensorFlow Model with TF Lite 84
Optimization Options 85
Pruning 86
Knowledge Distillation 89
Teacher and Student Networks 89
Knowledge Distillation Techniques 90
TMKD: Distilling Knowledge for a Q&A Task 93
Increasing Robustness by Distilling EfficientNets 95
Conclusion 96
7. High-Performance Modeling 97
Distributed Training 97
Data Parallelism 98
Efficient Input Pipelines 101
Input Pipeline Basics 101
Input Pipeline Patterns: Improving Efficiency 102
Optimizing Your Input Pipeline with TensorFlow Data 103
Training Large Models: The Rise of Giant Neural Nets and Parallelism 105
Potential Solutions and Their Shortcomings 106
Pipeline Parallelism to the Rescue? 107
Conclusion 109
8. Model Analysis 111
Analyzing Model Performance 111
Black-Box Evaluation 112
Performance Metrics and Optimization Objectives 112
Advanced Model Analysis 113
TensorFlow Model Analysis 113
The Learning Interpretability Tool 119
Advanced Model Debugging 120
Benchmark Models 121
Sensitivity Analysis 121
Residual Analysis 125
Model Remediation 126
Discrimination Remediation 127
Fairness 127
Fairness Evaluation 128
Fairness Considerations 130
Continuous Evaluation and Monitoring 130
Conclusion 131
9. Interpretability 133
Explainable AI 133
Model Interpretation Methods 136
Method Categories 136
Intrinsically Interpretable Models 139
Model-Agnostic Methods 144
Local Interpretable Model-Agnostic Explanations 148
Shapley Values 149
The SHAP Library 151
Testing Concept Activation Vectors 153
AI Explanations 154
Example: Exploring Model Sensitivity with SHAP 156
Regression Models 156
Natural Language Processing Models 158
Conclusion 159
10. Neural Architecture Search 161
Hyperparameter Tuning 161
Introduction to AutoML 163
Key Components of NAS 163
Search Spaces 164
Search Strategies 166
Performance Estimation Strategies 168
AutoML in the Cloud 169
Amazon SageMaker Autopilot 169
Microsoft Azure Automated Machine Learning 170
Google Cloud AutoML 171
Using AutoML 172
Generative AI and AutoML 172
Conclusion 172
11. Introduction to Model Serving 173
Model Training 173
Model Prediction 174
Latency 174
Throughput 174
Cost 175
Resources and Requirements for Serving Models 175
Cost and Complexity 175
Accelerators 176
Feeding the Beast 177
Model Deployments 177
Data Center Deployments 178
Mobile and Distributed Deployments 178
Model Servers 179
Managed Services 180
Conclusion 181
12. Model Serving Patterns 183
Batch Inference 183
Batch Throughput 184
Table of Contents | vii
Batch Inference Use Cases 185
ETL for Distributed Batch and Stream Processing Systems 186
Introduction to Real-Time Inference 186
Synchronous Delivery of Real-Time Predictions 188
Asynchronous Delivery of Real-Time Predictions 188
Optimizing Real-Time Inference 188
Real-Time Inference Use Cases 189
Serving Model Ensembles 190
Ensemble Topologies 190
Example Ensemble 190
Ensemble Serving Considerations 190
Model Routers: Ensembles in GenAI 191
Data Preprocessing and Postprocessing in Real Time 191
Training Transformations Versus Serving Transformations 193
Windowing 193
Options for Preprocessing 194
Enter TensorFlow Transform 196
Postprocessing 197
Inference at the Edge and at the Browser 198
Challenges 199
Model Deployments via Containers 200
Training on the Device 200
Federated Learning 201
Runtime Interoperability 201
Inference in Web Browsers 202
Conclusion 202
13. Model Serving Infrastructure 203
Model Servers 204
TensorFlow Serving 204
NVIDIA Triton Inference Server 206
TorchServe 207
Building Scalable Infrastructure 208
Containerization 210
Traditional Deployment Era 210
Virtualized Deployment Era 211
Container Deployment Era 211
The Docker Containerization Framework 211
Container Orchestration 213
Reliability and Availability Through Redundancy 216
Observability 217
High Availability 218
Automated Deployments 219
Hardware Accelerators 219
GPUs 220
TPUs 220
Conclusion 221
14. Model Serving Examples 223
Example: Deploying TensorFlow Models with TensorFlow Serving 223
Exporting Keras Models for TF Serving 223
Setting Up TF Serving with Docker 224
Basic Configuration of TF Serving 224
Making Model Prediction Requests with REST 225
Making Model Prediction Requests with gRPC 227
Getting Predictions from Classification and Regression Models 228
Using Payloads 229
Getting Model Metadata from TF Serving 229
Making Batch Inference Requests 230
Example: Profiling TF Serving Inferences with TF Profiler 232
Prerequisites 232
TensorBoard Setup 233
Model Profile 234
Example: Basic TorchServe Setup 238
Installing the TorchServe Dependencies 238
Exporting Your Model for TorchServe 238
Setting Up TorchServe 239
Making Model Prediction Requests 242
Making Batch Inference Requests 242
Conclusion 243
15. Model Management and Delivery 245
Experiment Tracking 245
Experimenting in Notebooks 246
Experimenting Overall 247
Tools for Experiment Tracking and Versioning 248
Introduction to MLOps 252
Data Scientists Versus Software Engineers 252
ML Engineers 252
ML in Products and Services 253
MLOps 253
MLOps Methodology 255
MLOps Level 0 255
MLOps Level 1 257
MLOps Level 2 260
Components of an Orchestrated Workflow 263
Three Types of Custom Components 265
Python Function–Based Components 265
Container-Based Components 266
Fully Custom Components 267
TFX Deep Dive 270
TFX SDK 270
Intermediate Representation 271
Runtime 271
Implementing an ML Pipeline Using TFX Components 271
Advanced Features of TFX 273
Managing Model Versions 275
Approaches to Versioning Models 275
Model Lineage 277
Model Registries 277
Continuous Integration and Continuous Deployment 278
Continuous Integration 278
Continuous Delivery 280
Progressive Delivery 280
Blue/Green Deployment 281
Canary Deployment 281
Live Experimentation 282
Conclusion 284
16. Model Monitoring and Logging 285
The Importance of Monitoring 286
Observability in Machine Learning 287
What Should You Monitor? 288
Custom Alerting in TFX 289
Logging 290
Distributed Tracing 292
Monitoring for Model Decay 293
Data Drift and Concept Drift 294
Model Decay Detection 295
Supervised Monitoring Techniques 296
Unsupervised Monitoring Techniques 297
Mitigating Model Decay 298
Retraining Your Model 299
When to Retrain 299
Automated Retraining 300
Conclusion 300
17. Privacy and Legal Requirements 301
Why Is Data Privacy Important? 302
What Data Needs to Be Kept Private? 302
Harms 303
Only Collect What You Need 303
GenAI Data Scraped from the Web and Other Sources 304
Legal Requirements 304
The GDPR and the CCPA 304
The GDPR’s Right to Be Forgotten 305
Pseudonymization and Anonymization 306
Differential Privacy 307
Local and Global DP 308
Epsilon-Delta DP 308
Applying Differential Privacy to ML 309
TensorFlow Privacy Example 310
Federated Learning 312
Encrypted ML 313
Conclusion 314
18. Orchestrating Machine Learning Pipelines 315
An Introduction to Pipeline Orchestration 315
Why Pipeline Orchestration? 315
Directed Acyclic Graphs 316
Pipeline Orchestration with TFX 317
Interactive TFX Pipelines 317
Converting Your Interactive Pipeline for Production 319
Orchestrating TFX Pipelines with Apache Beam 319
Orchestrating TFX Pipelines with Kubeflow Pipelines 321
Introduction to Kubeflow Pipelines 321
Installation and Initial Setup 323
Accessing Kubeflow Pipelines 324
The Workflow from TFX to Kubeflow 325
OpFunc Functions 328
Orchestrating Kubeflow Pipelines 330
Google Cloud Vertex Pipelines 333
Setting Up Google Cloud and Vertex Pipelines 333
Setting Up a Google Cloud Service Account 337
Orchestrating Pipelines with Vertex Pipelines 340
Executing Vertex Pipelines 342
Choosing Your Orchestrator 344
Interactive TFX 344
Apache Beam 344
Kubeflow Pipelines 344
Google Cloud Vertex Pipelines 345
Alternatives to TFX 345
Conclusion 345
19. Advanced TFX 347
Advanced Pipeline Practices 347
Configure Your Components 347
Import Artifacts 348
Use Resolver Node 349
Execute a Conditional Pipeline 350
Export TF Lite Models 351
Warm-Starting Model Training 352
Use Exit Handlers 353
Trigger Messages from TFX 354
Custom TFX Components: Architecture and Use Cases 356
Architecture of TFX Components 356
Use Cases of Custom Components 357
Using Function-Based Custom Components 357
Writing a Custom Component from Scratch 358
Defining Component Specifications 360
Defining Component Channels 361
Writing the Custom Executor 361
Writing the Custom Driver 364
Assembling the Custom Component 365
Using Our Basic Custom Component 366
Implementation Review 367
Reusing Existing Components 367
Creating Container-Based Custom Components 370
Which Custom Component Is Right for You? 372
TFX-Addons 373
Conclusion 374
20. ML Pipelines for Computer Vision Problems 375
Our Data 376
Our Model 376
Custom Ingestion Component 377
Data Preprocessing 378
Exporting the Model 379
Our Pipeline 380
Data Ingestion 380
Data Preprocessing 381
Model Training 382
Model Evaluation 382
Model Export 384
Putting It All Together 384
Executing on Apache Beam 385
Executing on Vertex Pipelines 386
Model Deployment with TensorFlow Serving 387
Conclusion 389
21. ML Pipelines for Natural Language Processing 391
Our Data 392
Our Model 392
Ingestion Component 393
Data Preprocessing 394
Putting the Pipeline Together 397
Executing the Pipeline 397
Model Deployment with Google Cloud Vertex 398
Registering Your ML Model 398
Creating a New Model Endpoint 400
Deploying Your ML Model 400
Requesting Predictions from the Deployed Model 402
Cleaning Up Your Deployed Model 403
Conclusion 404
22. Generative AI 405
Generative Models 406
GenAI Model Types 406
Agents and Copilots 407
Pretraining 407
Pretraining Datasets 408
Embeddings 408
Self-Supervised Training with Masks 409
Fine-Tuning 410
Fine-Tuning Versus Transfer Learning 410
Fine-Tuning Datasets 411
Fine-Tuning Considerations for Production 411
Fine-Tuning Versus Model APIs 412
Parameter-Efficient Fine-Tuning 412
LoRA 412
S-LoRA 413
Human Alignment 413
Reinforcement Learning from Human Feedback 413
Reinforcement Learning from AI Feedback 414
Direct Preference Optimization 414
Prompting 415
Chaining 416
Retrieval Augmented Generation 416
ReAct 417
Evaluation 417
Evaluation Techniques 417
Benchmarking Across Models 418
LMOps 418
GenAI Attacks 419
Jailbreaks 419
Prompt Injection 420
Responsible GenAI 420
Design for Responsibility 420
Conduct Adversarial Testing 421
Constitutional AI 421
Conclusion 422
23. The Future of Machine Learning Production Systems and Next Steps 423
Let’s Think in Terms of ML Systems, Not ML Models 423
Bringing ML Systems Closer to Domain Experts 424
Privacy Has Never Been More Important 424
Conclusion 424
Index 427
Using machine learning for products, services, and critical business processes is quite different from using ML in an academic or research setting—especially for recent ML graduates and those moving from research to a commercial environment. Whether you currently work to create products and services that use ML, or would like to in the future, this practical book gives you a broad view of the entire field.
Authors Robert Crowe, Hannes Hapke, Emily Caveness, and Di Zhu help you identify topics that you can dive into deeper, along with reference materials and tutorials that teach you the details. You'll learn the state of the art of machine learning engineering, including a wide range of topics such as modeling, deployment, and MLOps. You'll learn the basics and advanced aspects to understand the production ML lifecycle.
This book provides four in-depth sections that cover all aspects of machine learning engineering: