The Pragmatic Programmer for Machine Learning: Engineering Analytics and Data Science Solutions

The Pragmatic Programmer for Machine Learning: Engineering Analytics and Data Science Solutions

Автор: Malvestio Mauro , Scutari Marco

Дата выхода: 2023

Издательство: CRC Press is an imprint of Taylor & Francis Group, LLC

Количество страниц: 357

Размер файла: 2,1 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы

Cover....1

Half Title....2

Series Page....3

Title Page....4

Dedication....6

Contents....8

Preface....14

1. What Is This Book About?....18

1.1. Machine Learning....18

1.2. Data Science....21

1.3. Software Engineering....23

1.4. How Do They Go Together?....25

I. Foundations of Scientific Computing....28

2. Hardware Architectures....30

2.1. Types of Hardware....31

2.1.1. Compute....32

2.1.2. Memory....37

2.1.3. Connections....41

2.2. Making Hardware Live Up to Expectations....43

2.3. Local and Remote Hardware....45

2.4. Choosing the Right Hardware for the Job....47

3. Variable Types and Data Structures....52

3.1. Variable Types....53

3.1.1. Integers....53

3.1.2. Floating Point....57

3.1.3. Strings....64

3.2. Data Structures....65

3.2.1. Vectors and Lists....66

3.2.2. Representing Data with Data Frames....68

3.2.3. Dense and Sparse Matrices....70

3.3. Choosing the Right Variable Types for the Job....73

3.4. Choosing the Right Data Structures for the Job....78

4. Analysis of Algorithms....80

4.1. Writing Pseudocode....80

4.2. Computational Complexity and Big-O Notation....83

4.3. Big-O Notation and Benchmarking....87

4.4. Algorithm Analysis for Machine Learning....89

4.5. Some Examples of Algorithm Analysis....90

4.5.1. Estimating Linear Regression Models....91

4.5.2. Sparse Matrices Representation....97

4.5.3. Uniform Simulations of Directed Acyclic Graphs....101

4.6. Big-O Notation and Real-World Performance....107

II. Best Practices for Machine Learning Pipelines....110

5. Designing and Structuring Pipelines....112

5.1. Data as Code....112

5.2. Technical Debt....115

5.2.1. At the Data Level....116

5.2.2. At the Model Level....118

5.2.3. At the Architecture (Design) Level....121

5.2.4. At the Code Level....123

5.3. Machine Learning Pipeline....124

5.3.1. Project Scoping....128

5.3.2. Producing a Baseline Implementation....132

5.3.3. Data Ingestion and Preparation....133

5.3.4. Model Training, Evaluation and Validation....135

5.3.5. Deployment, Serving and Inference....138

5.3.6. Monitoring, Logging and Reporting....140

6. Writing Machine Learning Code....146

6.1. Choosing Languages and Libraries....147

6.2. Naming Things....150

6.3. Coding Styles and Coding Standards....153

6.4. Filesystem Structure....156

6.5. Effective Versioning....160

6.6. Code Review....163

6.7. Refactoring....168

6.8. Reworking Academic Code: An Example....170

7. Packaging and Deploying Pipelines....180

7.1. Model Packaging....180

7.1.1. Standalone Packaging....181

7.1.2. Programming Language Package Managers....181

7.1.3. Virtual Machines....182

7.1.4. Containers....184

7.2. Model Deployment: Strategies....189

7.3. Model Deployment: Infrastructure....193

7.4. Model Deployment: Monitoring and Logging....194

7.5. What Can Possibly Go Wrong?....196

7.6. Rolling Back....199

8. Documenting Pipelines....202

8.1. Comments....203

8.2. Documenting Public Interfaces....206

8.3. Documenting Architecture and Design....216

8.4. Documenting Algorithms and Business Cases....222

8.5. Illustrating Practical Use Cases....226

9. Troubleshooting and Testing Pipelines....230

9.1. Data Are the Problem....231

9.1.1. Large Data....232

9.1.2. Heterogeneous Data....234

9.1.3. Dynamic Data....235

9.2. Models Are the Problem....236

9.2.1. Large Models....236

9.2.2. Black-Box Models....237

9.2.3. Costly Models....238

9.2.4. Many Models....239

9.3. Common Signs That Something Is Up....240

9.4. Tests Are the Solution....243

9.4.1. What Do We Want to Achieve?....244

9.4.2. What Should We Test?....245

9.4.3. Offline and Online Data....247

9.4.4. Testing Local and Testing Global....251

9.4.5. Conceptual and Implementation Errors....254

9.4.6. Code Coverage and Test Prioritisation....256

III. Tools and Technologies....262

10. Tools for Developing Pipelines....264

10.1. Data Exploration and Experiment Tracking....264

10.2. Code Development....268

10.2.1. Code Editors and IDEs....269

10.2.2. Notebooks....271

10.2.3. Accessing Data and Documentation....274

10.3. Build, Test and Documentation Tools....274

11. Tools to Manage Pipelines in Production....280

11.1. Infrastructure Management....280

11.2. Machine Learning Software Management....283

11.3. Dashboards, Visualisation and Reporting....288

IV. A Case Study....292

12. Recommending Recommendations: A Recommender System Using Natural Language Understanding....294

12.1. The Domain Problem....295

12.2. The Machine Learning Model....298

12.3. The Infrastructure....302

12.4. The Architecture of the Pipeline....305

12.4.1. Data Ingestion and Data Preparation....306

12.4.2. Data Tracking and Versioning....310

12.4.3. Training and Experiment Tracking....311

12.4.4. Model Packaging....314

12.4.5. Deployment and Inference....315

Bibliography....320

Index....354

Machine learning has redefined the way we work with data and is increasingly becoming an indispensable part of everyday life. The Pragmatic Programmer for Machine Learning: Engineering Analytics and Data Science Solutions discusses how modern software engineering practices are part of this revolution both conceptually and in practical applictions.

Comprising a broad overview of how to design machine learning pipelines as well as the state-of-the-art tools we use to make them, this book provides a multi-disciplinary view of how traditional software engineering can be adapted to and integrated with the workflows of domain experts and probabilistic models.

From choosing the right hardware to designing effective pipelines architectures and adopting software development best practices, this guide will appeal to machine learning and data science specialists, whilst also laying out key high-level principlesin a way that is approachable for students of computer science and aspiring programmers.

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг