Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide

Автор: Godoy Daniel Voigt

Дата выхода: 2022

Издательство: Lean Publishing

Количество страниц: 1045

Размер файла: 13,0 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы

Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide....2

Table of Contents....5

Preface....23

Acknowledgements....24

About the Author....25

Frequently Asked Questions (FAQ)....26

Why PyTorch?....26

Why This Book?....27

Who Should Read This Book?....28

What Do I Need to Know?....29

How to Read This Book....29

What’s Next?....32

Setup Guide....33

Official Repository....33

Environment....33

Google Colab....33

Binder....34

Local Installation....35

1. Anaconda....36

2. Conda (Virtual) Environments....36

3. PyTorch....38

4. TensorBoard....40

5. GraphViz and Torchviz (optional)....41

6. Git....42

7. Jupyter....44

Moving On....44

Part I: Fundamentals....46

Chapter 0: Visualizing Gradient Descent....47

Spoilers....47

Jupyter Notebook....47

Imports....48

Visualizing Gradient Descent....48

Model....49

Data Generation....50

Synthetic Data Generation....50

Train-Validation-Test Split....52

Step 0 - Random Initialization....53

Step 1 - Compute Model’s Predictions....54

Step 2 - Compute the Loss....55

Loss Surface....57

Cross-Sections....61

Step 3 - Compute the Gradients....62

Visualizing Gradients....64

Backpropagation....65

Step 4 - Update the Parameters....66

Learning Rate....68

Low Learning Rate....69

High Learning Rate....71

Very High Learning Rate....72

"Bad" Feature....73

Scaling / Standardizing / Normalizing....76

Step 5 - Rinse and Repeat!....80

The Path of Gradient Descent....81

Recap....83

Chapter 1: A Simple Regression Problem....85

Spoilers....85

Jupyter Notebook....85

Imports....86

A Simple Regression Problem....86

Data Generation....87

Synthetic Data Generation....87

Gradient Descent....88

Step 0 - Random Initialization....89

Step 1 - Compute Model’s Predictions....89

Step 2 - Compute the Loss....89

Step 3 - Compute the Gradients....90

Step 4 - Update the Parameters....91

Step 5 - Rinse and Repeat!....92

Linear Regression in Numpy....92

PyTorch....96

Tensor....96

Loading Data, Devices, and CUDA....101

Creating Parameters....106

Autograd....110

backward....110

grad....112

zero_....113

Updating Parameters....114

no_grad....117

Dynamic Computation Graph....117

Optimizer....121

step / zero_grad....122

Loss....124

Model....128

Parameters....130

state_dict....131

Device....132

Forward Pass....132

train....134

Nested Models....134

Sequential Models....137

Layers....138

Putting It All Together....140

Data Preparation....141

Model Configuration....142

Model Training....143

Recap....146

Chapter 2: Rethinking the Training Loop....148

Spoilers....148

Jupyter Notebook....148

Imports....148

Rethinking the Training Loop....149

Training Step....155

Dataset....159

TensorDataset....161

DataLoader....161

Mini-Batch Inner Loop....167

Random Split....170

Evaluation....172

Plotting Losses....176

TensorBoard....177

Running It Inside a Notebook....177

Running It Separately (Local Installation)....179

Running It Separately (Binder)....180

SummaryWriter....180

add_graph....182

add_scalars....183

Saving and Loading Models....189

Model State....189

Saving....189

Resuming Training....190

Deploying / Making Predictions....193

Setting the Model’s Mode....194

Putting It All Together....195

Recap....198

Chapter 2.1: Going Classy....200

Spoilers....200

Jupyter Notebook....200

Imports....200

Going Classy....201

The Class....201

The Constructor....202

Arguments....202

Placeholders....203

Variables....205

Functions....205

Training Methods....212

Saving and Loading Models....216

Visualization Methods....217

The Full Code....218

Classy Pipeline....219

Model Training....222

Making Predictions....224

Checkpointing....224

Resuming Training....225

Putting It All Together....227

Recap....229

Chapter 3: A Simple Classification Problem....231

Spoilers....231

Jupyter Notebook....231

Imports....231

A Simple Classification Problem....232

Data Generation....233

Data Preparation....234

Model....235

Logits....236

Probabilities....237

Odds Ratio....237

Log Odds Ratio....239

From Logits to Probabilities....240

Sigmoid....242

Logistic Regression....243

Loss....246

BCELoss....248

BCEWithLogitsLoss....250

Imbalanced Dataset....253

Model Configuration....256

Model Training....257

Decision Boundary....261

Classification Threshold....266

Confusion Matrix....268

Metrics....270

True and False Positive Rates....270

Precision and Recall....273

Accuracy....274

Trade-offs and Curves....275

Low Threshold....275

High Threshold....277

ROC and PR Curves....278

The Precision Quirk....280

Best and Worst Curves....281

Comparing Models....282

Putting It All Together....284

Recap....286

Part II: Computer Vision....289

Chapter 4: Classifying Images....290

Spoilers....290

Jupyter Notebook....290

Imports....290

Classifying Images....291

Data Generation....292

Shape (NCHW vs NHWC)....296

Torchvision....299

Datasets....299

Models....299

Transforms....299

Transforms on Images....303

Transforms on Tensor....303

Normalize Transform....304

Composing Transforms....306

Data Preparation....308

Dataset Transforms....308

SubsetRandomSampler....310

Data Augmentation Transforms....313

WeightedRandomSampler....314

Seeds and more (seeds)....318

Putting It Together....320

Pixels as Features....321

Shallow Model....323

Notation....324

Model Configuration....325

Model Training....326

Deep-ish Model....326

Model Configuration....329

Model Training....329

Show Me the Math!....331

Show Me the Code!....333

Weights as Pixels....336

Activation Functions....337

Sigmoid....337

Hyperbolic Tangent (TanH)....339

Rectified Linear Unit (ReLU)....340

Leaky ReLU....342

Parametric ReLU (PReLU)....344

Deep Model....345

Model Configuration....346

Model Training....347

Show Me the Math Again!....349

Putting It All Together....351

Recap....355

Bonus Chapter: Feature Space....356

Two-Dimensional Feature Space....356

Transformations....357

A Two-Dimensional Model....358

Decision Boundary, Activation Style!....360

More Functions, More Boundaries....363

More Layers, More Boundaries....365

More Dimensions, More Boundaries....366

Recap....368

Chapter 5: Convolutions....369

Spoilers....369

Jupyter Notebook....369

Imports....369

Convolutions....370

Filter / Kernel....370

Convolving....372

Moving Around....373

Shape....376

Convolving in PyTorch....377

Striding....381

Padding....383

A REAL Filter....387

Pooling....389

Flattening....391

Dimensions....392

Typical Architecture....392

LeNet-5....393

A Multiclass Classification Problem....396

Data Generation....396

Data Preparation....397

Loss....400

Logits....400

Softmax....400

LogSoftmax....403

Negative Log-Likelihood Loss....403

Cross-Entropy Loss....407

Classification Losses Showdown!....409

Model Configuration....409

Model Training....412

Visualizing Filters and More!....413

Visualizing Filters....416

Hooks....419

Visualizing Feature Maps....428

Visualizing Classifier Layers....431

Accuracy....432

Loader Apply....434

Putting It All Together....435

Recap....439

Chapter 6: Rock, Paper, Scissors....441

Spoilers....441

Jupyter Notebook....441

Imports....441

Rock, Paper, Scissors…....442

Rock Paper Scissors Dataset....443

Data Preparation....444

ImageFolder....444

Standardization....445

The Real Datasets....449

Three-Channel Convolutions....450

Fancier Model....453

Dropout....456

Two-Dimensional Dropout....462

Model Configuration....463

Optimizer....463

Learning Rate....464

Model Training....464

Accuracy....465

Regularizing Effect....465

Visualizing Filters....467

Learning Rates....469

Finding LR....471

Adaptive Learning Rate....479

Moving Average (MA)....479

EWMA....480

EWMA Meets Gradients....486

Adam....487

Visualizing Adapted Gradients....488

Stochastic Gradient Descent (SGD)....495

Momentum....496

Nesterov....499

Flavors of SGD....500

Learning Rate Schedulers....503

Epoch Schedulers....504

Validation Loss Scheduler....505

Schedulers in StepByStep — Part I....507

Mini-Batch Schedulers....510

Schedulers in StepByStep — Part II....512

Scheduler Paths....514

Adaptive vs Cycling....517

Putting It All Together....517

Recap....520

Chapter 7: Transfer Learning....523

Spoilers....523

Jupyter Notebook....523

Imports....524

Transfer Learning....524

ImageNet....525

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)....525

ILSVRC-2012....526

AlexNet (SuperVision Team)....526

ILSVRC-2014....526

VGG....527

Inception (GoogLeNet Team)....527

ILSVRC-2015....527

ResNet (MSRA Team)....528

Comparing Architectures....528

Transfer Learning in Practice....530

Pre-Trained Model....530

Adaptive Pooling....532

Loading Weights....533

Model Freezing....534

Top of the Model....535

Model Configuration....538

Data Preparation....538

Model Training....540

Generating a Dataset of Features....541

Top Model....544

Auxiliary Classifiers (Side-Heads)....546

1x1 Convolutions....549

Inception Modules....552

Batch Normalization....557

Running Statistics....560

Evaluation Phase....566

Momentum....567

BatchNorm2d....569

Other Normalizations....570

Small Summary....570

Residual Connections....571

Learning the Identity....571

The Power of Shortcuts....575

Residual Blocks....576

Putting It All Together....579

Fine-Tuning....580

Feature Extraction....581

Recap....583

Extra Chapter: Vanishing and Exploding Gradients....586

Spoilers....586

Jupyter Notebook....586

Imports....586

Vanishing and Exploding Gradients....587

Vanishing Gradients....587

Ball Dataset and Block Model....588

Weights, Activations, and Gradients....590

Initialization Schemes....592

Batch Normalization....595

Exploding Gradients....596

Data Generation & Preparation....596

Model Configuration & Training....597

Gradient Clipping....599

Value Clipping....600

Norm Clipping (or Gradient Scaling)....601

Model Configuration & Training....605

Clipping with Hooks....608

Recap....609

Part III: Sequences....611

Chapter 8: Sequences....612

Spoilers....612

Jupyter Notebook....612

Imports....613

Sequences....613

Data Generation....614

Recurrent Neural Networks (RNNs)....616

RNN Cell....619

RNN Layer....626

Shapes....629

Stacked RNN....632

Bidirectional RNN....636

Square Model....640

Data Generation....640

Data Preparation....641

Model Configuration....641

Model Training....643

Visualizing the Model....644

Transformed Inputs....644

Hidden States....645

The Journey of a Hidden State....647

Can We Do Better?....649

Gated Recurrent Units (GRUs)....650

GRU Cell....651

GRU Layer....659

Square Model II — The Quickening....660

Model Configuration & Training....661

Visualizing the Model....662

Hidden States....662

The Journey of a Gated Hidden State....663

Can We Do Better?....665

Long Short-Term Memory (LSTM)....665

LSTM Cell....666

LSTM Layer....674

Square Model III — The Sorcerer....675

Model Configuration & Training....676

Visualizing the Hidden States....677

Variable-Length Sequences....678

Padding....679

Packing....682

Unpacking (to padded)....686

Packing (from padded)....688

Variable-Length Dataset....689

Data Preparation....689

Collate Function....691

Square Model IV — Packed....692

Model Configuration & Training....694

1D Convolutions....695

Shapes....696

Multiple Features or Channels....697

Dilation....699

Data Preparation....701

Model Configuration & Training....701

Visualizing the Model....703

Putting It All Together....704

Fixed-Length Dataset....704

Variable-Length Dataset....705

There Can Be Only ONE … Model....706

Model Configuration & Training....707

Recap....708

Chapter 9 — Part I: Sequence-to-Sequence....711

Spoilers....711

Jupyter Notebook....711

Imports....711

Sequence-to-Sequence....712

Data Generation....712

Encoder-Decoder Architecture....714

Encoder....714

Decoder....716

Teacher Forcing....721

Encoder + Decoder....723

Data Preparation....726

Model Configuration & Training....728

Visualizing Predictions....729

Can We Do Better?....729

Attention....730

"Values"....733

"Keys" and "Queries"....733

Computing the Context Vector....735

Scoring Method....739

Attention Scores....741

Scaled Dot Product....742

Attention Mechanism....748

Source Mask....751

Decoder....753

Encoder + Decoder + Attention....755

Model Configuration & Training....757

Visualizing Predictions....758

Visualizing Attention....759

Multi-Headed Attention....760

Chapter 9 — Part II: Sequence-to-Sequence....765

Spoilers....765

Self-Attention....765

Encoder....766

Cross-Attention....771

Decoder....773

Subsequent Inputs and Teacher Forcing....775

Attention Scores....776

Target Mask (Training)....777

Target Mask (Evaluation/Prediction)....779

Encoder + Decoder + Self-Attention....783

Model Configuration & Training....787

Visualizing Predictions....788

Sequential No More....789

Positional Encoding (PE)....790

Encoder + Decoder + PE....803

Model Configuration & Training....805

Visualizing Predictions....806

Visualizing Attention....807

Putting It All Together....809

Data Preparation....809

Model Assembly....810

Encoder + Decoder + Positional Encoding....812

Self-Attention "Layers"....813

Attention Heads....815

Model Configuration & Training....817

Recap....818

Chapter 10: Transform and Roll Out....821

Spoilers....821

Jupyter Notebook....821

Imports....821

Transform and Roll Out....822

Narrow Attention....822

Chunking....823

Multi-Headed Attention....826

Stacking Encoders and Decoders....832

Wrapping "Sub-Layers"....833

Transformer Encoder....836

Transformer Decoder....841

Layer Normalization....846

Batch vs Layer....851

Our Seq2Seq Problem....853

Projections or Embeddings....854

The Transformer....856

Data Preparation....859

Model Configuration & Training....860

Visualizing Predictions....863

The PyTorch Transformer....863

Model Configuration & Training....869

Visualizing Predictions....870

Vision Transformer....871

Data Generation & Preparation....871

Patches....874

Rearranging....874

Embeddings....876

Special Classifier Token....878

The Model....882

Model Configuration & Training....884

Putting It All Together....886

Data Preparation....886

Model Assembly....886

1. Encoder-Decoder....888

2. Encoder....891

3. Decoder....892

4. Positional Encoding....893

5. Encoder "Layer"....894

6. Decoder "Layer"....895

7. "Sub-Layer" Wrapper....896

8. Multi-Headed Attention....898

Model Configuration & Training....900

Recap....901

Part IV: Natural Language Processing....904

Chapter 11: Down the Yellow Brick Rabbit Hole....905

Spoilers....905

Jupyter Notebook....905

Additional Setup....906

Imports....906

"Down the Yellow Brick Rabbit Hole"....908

Building a Dataset....908

Sentence Tokenization....910

HuggingFace’s Dataset....916

Loading a Dataset....917

Attributes....918

Methods....919

Word Tokenization....921

Vocabulary....925

HuggingFace’s Tokenizer....931

Before Word Embeddings....939

One-Hot Encoding (OHE)....939

Bag-of-Words (BoW)....940

Language Models....941

N-grams....943

Continuous Bag-of-Words (CBoW)....944

Word Embeddings....944

Word2Vec....944

What Is an Embedding Anyway?....949

Pre-trained Word2Vec....952

Global Vectors (GloVe)....953

Using Word Embeddings....956

Vocabulary Coverage....956

Tokenizer....959

Special Tokens' Embeddings....960

Model I — GloVE + Classifier....962

Data Preparation....962

Pre-trained PyTorch Embeddings....964

Model Configuration & Training....966

Model II — GloVe + Transformer....967

Visualizing Attention....970

Contextual Word Embeddings....973

ELMo....974

BERT....982

Document Embeddings....984

Model III — Preprocessed Embeddings....987

Data Preparation....987

Model Configuration & Training....989

BERT....990

Tokenization....993

Input Embeddings....995

Pre-training Tasks....1000

Masked Language Model (MLM)....1000

Next Sentence Prediction (NSP)....1003

Outputs....1004

Model IV — Classifying Using BERT....1009

Data Preparation....1011

Model Configuration & Training....1013

Fine-Tuning with HuggingFace....1014

Sequence Classification (or Regression)....1014

Tokenized Dataset....1017

Trainer....1019

Predictions....1024

Pipelines....1026

More Pipelines....1027

GPT-2....1029

Putting It All Together....1033

Data Preparation....1033

"Packed" Dataset....1034

Model Configuration & Training....1037

Generating Text....1039

Recap....1041

Thank You!....1043

In 2019, I published a PyTorch tutorial on Towards Data Science and I was amazed by the reaction from the readers! Their feedback motivated me to write this book to help beginners start their journey into Deep Learning and PyTorch. I hope you enjoy reading this book as much as I enjoy writing it.

UPDATE (July, 19th, 2022): The Spanish version of Part I, Fundamentals, was published today:https://leanpub.com/pytorch_ES

UPDATE (February 23rd, 2022): The paperback edition is available now (the book had to be split into 3

volumes for printing). For more details, please check pytorchstepbystep.com.

UPDATE (February 13th, 2022): The latest revised edition (v1.1.1) was published today to address small changes to Chapters 9 and 10 that weren't included in the previous revision.

UPDATE (January 23rd, 2022): The revised edition (v1.1) was published today - better graphics, improved formatting, larger page size (thus reducing page count from 1187 to 1045 pages - no content was removed!). If you already bought the book, you can download the new version at any time!

If you're looking for a book where you can learn about Deep Learning and PyTorch without having to spend hours deciphering cryptic text and code, and that's easy and enjoyable to read, this is it :-)

The book covers from the basics of gradient descent all the way up to fine-tuning large NLP models (BERT and GPT-2) using HuggingFace. It is divided into four parts

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг