Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Name: Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
Author: Huyen Chip

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Автор: Huyen Chip

Дата выхода: 2022

Издательство: O’Reilly Media, Inc.

Количество страниц: 339

Размер файла: 3,8 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы

1. Machine Learning Systems in Production....5

When to Use Machine Learning....9

Machine Learning Use Cases....15

Understanding Machine Learning Systems....19

Mind vs. Data....19

Machine learning in research vs. in production....25

Machine learning systems vs. traditional software....39

Designing ML Systems in Production....42

Requirements for ML Systems....43

Iterative Process....46

Summary....50

2. Data Engineering Fundamentals....54

Data Sources....55

Data Formats....57

JSON....59

Row-major vs. Column-major Format....60

Text vs. Binary Format....64

Data Models....66

Relational Model....66

NoSQL....75

Structured vs. Unstructured Data....80

Data Storage Engines and Processing....83

Transactional and Analytical Processing....84

ETL: Extract, Transform, and Load....87

Modes of Dataflow....88

Data Passing Through Databases....88

Data Passing Through Services....89

Data Passing Through Real-time Transport....90

Batch Processing vs. Stream Processing....94

Summary....96

3. Training Data....99

Sampling....99

Non-Probability Sampling....100

Simple Random Sampling....101

Stratified Sampling....101

Weighted Sampling....101

Importance Sampling....102

Reservoir Sampling....102

Labeling....104

Hand Labels....105

Handling the Lack of Hand Labels....108

Class Imbalance....118

Challenges of Class Imbalance....119

Handling Class Imbalance....121

Data Augmentation....134

Simple Label-Preserving Transformations....135

Perturbation....138

Data Synthesis....139

Summary....141

4. Feature Engineering....143

Learned Features vs. Engineered Features....143

Common Feature Engineering Operations....147

Handling Missing Values....148

Scaling....152

Discretization....154

Encoding Categorical Features....155

Feature Crossing....157

Discrete and Continuous Positional Embeddings....159

Data Leakage....162

Common Causes for Data Leakage....163

Detecting Data Leakage....165

Engineering Good Features....166

Feature Importance....166

Feature Generalization....169

Summary....170

5. Model Development....172

Framing ML Problems....173

Types of ML Tasks....173

Objective Functions....181

Model Development and Training....184

Evaluating ML Models....184

Ensembles....193

Experiment Tracking and Versioning....205

Distributed Training....213

AutoML....219

Model Offline Evaluation....229

Baselines....230

Evaluation Methods....237

Summary....258

6. Model Deployment....261

Machine Learning Deployment Myths....265

Batch Prediction vs. Online Prediction....271

From Batch Prediction To Online Prediction....279

Unifying Batch Pipeline And Streaming Pipeline....282

Model Compression....285

Low-rank Factorization....286

Knowledge Distillation....287

Pruning....287

Quantization....288

ML on the Cloud and on the Edge....293

Compiling and Optimizing Models for Edge Devices....299

ML in Browsers....312

Summary....313

7. Why Machine Learning Systems Fail in Production....317

Natural Labels and Feedback Loop....317

Causes of ML System Failures....319

Production Data Differing From Training Data....320

Edge Cases....321

Degenerate Feedback Loop....323

Data Distribution Shifts....328

Types of Data Distribution Shifts....328

General Data Distribution Shifts....330

Handling Data Distribution Shifts....331

Summary....336

About the Author....339

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:

Engineering data and choosing the right metrics to solve a business problem
Automating the process for continually developing, evaluating, deploying, and updating models
Developing a monitoring system to quickly detect and address issues your models might encounter in production
Architecting an ML platform that serves across use cases
Developing responsible ML systems

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг