Deep Learning with Python, Third Edition....1
Praise for the Second Edition....3
brief contents....7
contents....8
preface....17
acknowledgments....19
about this book....20
Who should read this book....20
How this book is organized: A road map....21
About the code....22
liveBook discussion forum....22
about the authors....23
about the cover illustration....24
1 What is deep learning?....25
1.1 Artificial intelligence, machine learning, and deep learning....26
1.2 Artificial intelligence....26
1.3 Machine learning....27
1.4 Learning rules and representations from data....28
1.5 The “deep” in “deep learning”....31
1.6 Understanding how deep learning works, in three figures....32
1.7 What makes deep learning different....34
1.8 The age of generative AI....35
1.9 What deep learning has achieved so far....35
1.10 Beware of the short-term hype....36
1.11 Summer can turn to winter....38
1.12 The promise of AI....38
2 The mathematical building blocks of neural networks....40
2.1 A first look at a neural network....41
2.2 Data representations for neural networks....45
2.2.1 Scalars (rank-0 tensors)....46
2.2.2 Vectors (rank-1 tensors)....46
2.2.3 Matrices (rank-2 tensors)....46
2.2.4 Rank-3 tensors and higher-rank tensors....47
2.2.5 Key attributes....47
2.2.6 Manipulating tensors in NumPy....49
2.2.7 The notion of data batches....49
2.2.8 Real-world examples of data tensors....50
2.3 The gears of neural networks: Tensor operations....52
2.3.1 Element-wise operations....53
2.3.2 Broadcasting....54
2.3.3 Tensor product....56
2.3.4 Tensor reshaping....58
2.3.5 Geometric interpretation of tensor operations....59
2.3.6 A geometric interpretation of deep learning....62
2.4 The engine of neural networks: Gradient-based optimization....63
2.4.1 What’s a derivative?....65
2.4.2 Derivative of a tensor operation: The gradient....66
2.4.3 Stochastic gradient descent....67
2.4.4 Chaining derivatives: The Backpropagation algorithm....70
2.5 Looking back at our first example....75
2.5.1 Reimplementing our first example from scratch....77
2.5.2 Running one training step....79
2.5.3 The full training loop....81
2.5.4 Evaluating the model....82
3 Introduction to TensorFlow, PyTorch, JAX, and Keras....84
3.1 A brief history of deep learning frameworks....85
3.2 How these frameworks relate to each other....87
3.3 Introduction to TensorFlow....87
3.3.1 First steps with TensorFlow....88
3.3.2 An end-to-end example: A linear classifier in pure TensorFlow....93
3.3.3 What makes the TensorFlow approach unique....98
3.4 Introduction to PyTorch....98
3.4.1 First steps with PyTorch....99
3.4.2 An end-to-end example: A linear classifier in pure PyTorch....102
3.4.3 What makes the PyTorch approach unique....105
3.5 Introduction to JAX....106
3.5.1 First steps with JAX....106
3.5.2 Tensors in JAX....107
3.5.3 Random number generation in JAX....107
3.5.4 An end-to-end example: A linear classifier in pure JAX....112
3.5.5 What makes the JAX approach unique....114
3.6 Introduction to Keras....114
3.6.1 First steps with Keras....115
3.6.2 Layers: The building blocks of deep learning....116
3.6.3 From layers to models....120
3.6.4 The “compile” step: Configuring the learning process....121
3.6.5 Picking a loss function....123
3.6.6 Understanding the fit method....124
3.6.7 Monitoring loss and metrics on validation data....125
3.6.8 Inference: Using a model after training....126
4 Classification and regression....128
4.1 Classifying movie reviews: A binary classification example....130
4.1.1 The IMDb dataset....130
4.1.2 Preparing the data....131
4.1.3 Building your model....132
4.1.4 Validating your approach....135
4.1.5 Using a trained model to generate predictions on new data....139
4.1.6 Further experiments....139
4.1.7 Wrapping up....140
4.2 Classifying newswires: A multiclass classification example....140
4.2.1 The Reuters dataset....140
4.2.2 Preparing the data....142
4.2.3 Building your model....142
4.2.4 Validating your approach....144
4.2.5 Generating predictions on new data....148
4.2.6 A different way to handle the labels and the loss....148
4.2.7 The importance of having sufficiently large intermediate layers....149
4.2.8 Further experiments....149
4.2.9 Wrapping up....150
4.3 Predicting house prices: A regression example....150
4.3.1 The California Housing Price dataset....150
4.3.2 Preparing the data....152
4.3.3 Building your model....152
4.3.4 Validating your approach using K-fold validation....153
4.3.5 Generating predictions on new data....158
4.3.6 Wrapping up....158
5 Fundamentals of machine learning....160
5.1 Generalization: The goal of machine learning....160
5.1.1 Underfitting and overfitting....161
5.1.2 The nature of generalization in deep learning....167
5.2 Evaluating machine-learning models....173
5.2.1 Training, validation, and test sets....173
5.2.2 Beating a common-sense baseline....176
5.2.3 Things to keep in mind about model evaluation....176
5.3 Improving model fit....177
5.3.1 Tuning key gradient descent parameters....177
5.3.2 Using better architecture priors....179
5.3.3 Increasing model capacity....179
5.4 Improving generalization....182
5.4.1 Dataset curation....183
5.4.2 Feature engineering....183
5.4.3 Using early stopping....185
5.4.4 Regularizing your model....185
6 The universal workflow of machine learning....195
6.1 Defining the task....196
6.1.1 Framing the problem....196
6.1.2 Collecting a dataset....198
6.1.3 Understanding your data....202
6.1.4 Choosing a measure of success....202
6.2 Developing a model....203
6.2.1 Preparing the data....203
6.2.2 Choosing an evaluation protocol....204
6.2.3 Beating a baseline....205
6.2.4 Scaling up: Developing a model that overfits....206
6.2.5 Regularizing and tuning your model....207
6.3 Deploying your model....207
6.3.1 Explaining your work to stakeholders and setting expectations....208
6.3.2 Shipping an inference model....208
6.3.3 Monitoring your model in the wild....212
6.3.4 Maintaining your model....212
7 A deep dive on Keras....214
7.1 A spectrum of workflows....215
7.2 Different ways to build Keras models....216
7.2.1 The Sequential model....216
7.2.2 The Functional API....219
7.2.3 Subclassing the Model class....226
7.2.4 Mixing and matching different components....228
7.2.5 Remember: Use the right tool for the job....229
7.3 Using built-in training and evaluation loops....230
7.3.1 Writing your own metrics....231
7.3.2 Using callbacks....232
7.3.3 Writing your own callbacks....234
7.3.4 Monitoring and visualization with TensorBoard....236
7.4 Writing your own training and evaluation loops....238
7.4.1 Training vs. inference....239
7.4.2 Writing custom training step functions....240
7.4.3 Low-level usage of metrics....245
7.4.4 Using fit() with a custom training loop....246
7.4.5 Handling metrics in a custom train_step()....250
8 Image classification....255
8.1 Introduction to convnets....256
8.1.1 The convolution operation....258
8.1.2 The max-pooling operation....263
8.2 Training a convnet from scratch on a small dataset....265
8.2.1 The relevance of deep learning for small-data problems....266
8.2.2 Downloading the data....266
8.2.3 Building your model....269
8.2.4 Data preprocessing....271
8.2.5 Using data augmentation....276
8.3 Using a pretrained model....280
8.3.1 Feature extraction with a pretrained model....280
8.3.2 Fine-tuning a pretrained model....288
9 ConvNet architecture patterns....292
9.1 Modularity, hierarchy, and reuse....293
9.2 Residual connections....296
9.3 Batch normalization....300
9.4 Depthwise separable convolutions....302
9.5 Putting it together: A mini Xception-like model....304
9.6 Beyond convolution: Vision Transformers....306
10 Interpreting what ConvNets learn....308
10.1 Visualizing intermediate activations....309
10.2 Visualizing ConvNet filters....315
10.2.1 Gradient ascent in TensorFlow....318
10.2.2 Gradient ascent in PyTorch....319
10.2.3 Gradient ascent in JAX....319
10.2.4 The filter visualization loop....320
10.3 Visualizing heatmaps of class activation....323
10.3.1 Getting the gradient of the top class: TensorFlow version....326
10.3.2 Getting the gradient of the top class: PyTorch version....326
10.3.3 Getting the gradient of the top class: JAX version....327
10.3.4 Displaying the class activation heatmap....328
10.4 Visualizing the latent space of a ConvNet....330
11 Image segmentation....332
11.1 Computer vision tasks....332
11.1.1 Types of image segmentation....334
11.2 Training a segmentation model from scratch....335
11.2.1 Downloading a segmentation dataset....335
11.2.2 Building and training the segmentation model....338
11.3 Using a pretrained segmentation model....342
11.3.1 Downloading the Segment Anything Model....343
11.3.2 How Segment Anything works....343
11.3.3 Preparing a test image....345
11.3.4 Prompting the model with a target point....347
11.3.5 Prompting the model with a target box....351
12 Object detection....353
12.1 Single-stage vs. two-stage object detectors....354
12.1.1 Two-stage R-CNN detectors....354
12.1.2 Single-stage detectors....356
12.2 Training a YOLO model from scratch....356
12.2.1 Downloading the COCO dataset....356
12.2.2 Creating a YOLO model....360
12.2.3 Readying the COCO data for the YOLO model....363
12.2.4 Training the YOLO model....366
12.3 Using a pretrained RetinaNet detector....370
13 Timeseries forecasting....375
13.1 Different kinds of timeseries tasks....375
13.2 A temperature forecasting example....376
13.2.1 Preparing the data....380
13.2.2 A commonsense, non-machine-learning baseline....383
13.2.3 Let’s try a basic machine learning model....384
13.2.4 Let’s try a 1D convolutional model....386
13.3 Recurrent neural networks....388
13.3.1 Understanding recurrent neural networks....389
13.3.2 A recurrent layer in Keras....392
13.3.3 Getting the most out of recurrent neural networks....396
13.3.4 Using recurrent dropout to fight overfitting....396
13.3.5 Stacking recurrent layers....399
13.3.6 Using bidirectional RNNs....401
13.4 Going even further....403
14 Text classification....405
14.1 A brief history of natural language processing....405
14.2 Preparing text data....408
14.2.1 Character and word tokenization....411
14.2.2 Subword tokenization....414
14.3 Sets vs. sequences....419
14.3.1 Loading the IMDb classification dataset....420
14.4 Set models....422
14.4.1 Training a bag-of-words model....423
14.4.2 Training a bigram model....427
14.5 Sequence models....429
14.5.1 Training a recurrent model....430
14.5.2 Understanding word embeddings....433
14.5.3 Using a word embedding....434
14.5.4 Pretraining a word embedding....438
14.5.5 Using the pretrained embedding for classification....442
15 Language models and the Transformer....445
15.1 The language model....445
15.1.1 Training a Shakespeare language model....446
15.1.2 Generating Shakespeare....450
15.2 Sequence-to-sequence learning....452
15.2.1 English-to-Spanish translation....454
15.2.2 Sequence-to-sequence learning with RNNs....456
15.3 The Transformer architecture....461
15.3.1 Dot-product attention....463
15.3.2 Transformer encoder block....468
15.3.3 Transformer decoder block....470
15.3.4 Sequence-to-sequence learning with a Transformer....472
15.3.5 Embedding positional information....475
15.4 Classification with a pretrained Transformer....478
15.4.1 Pretraining a Transformer encoder....478
15.4.2 Loading a pretrained Transformer....479
15.4.3 Preprocessing IMDb movie reviews....482
15.4.4 Fine-tuning a pretrained Transformer....484
15.5 What makes the Transformer effective?....485
16 Text generation....490
16.1 A brief history of sequence generation....492
16.2 Training a mini-GPT....494
16.2.1 Building the model....497
16.2.2 Pretraining the model....500
16.2.3 Generative decoding....502
16.2.4 Sampling strategies....504
16.3 Using a pretrained LLM....508
16.3.1 Text generation with the Gemma model....509
16.3.2 Instruction fine-tuning....512
16.3.3 Low-Rank Adaptation (LoRA)....514
16.4 Going further with LLMs....519
16.4.1 Reinforcement Learning with Human Feedback (RLHF)....519
16.4.2 Multimodal LLMs....522
16.4.3 Retrieval Augmented Generation (RAG)....525
16.4.4 “Reasoning” models....526
16.5 Where are LLMs heading next?....528
17 Image generation....532
17.1 Deep learning for image generation....532
17.1.1 Sampling from latent spaces of images....533
17.1.2 Variational autoencoders....534
17.1.3 Implementing a VAE with Keras....537
17.2 Diffusion models....542
17.2.1 The Oxford Flowers dataset....544
17.2.2 A U-Net denoising autoencoder....545
17.2.3 The concepts of diffusion time and diffusion schedule....547
17.2.4 The training process....549
17.2.5 The generation process....551
17.2.6 Visualizing results with a custom callback....552
17.2.7 It’s go time!....553
17.3 Text-to-image models....555
17.3.1 Exploring the latent space of a text-to-image model....557
18 Best practices for the real world....562
18.1 Getting the most out of your models....563
18.1.1 Hyperparameter optimization....563
18.1.2 Model ensembling....570
18.2 Scaling up model training with multiple devices....572
18.2.1 Multi-GPU training....572
18.2.2 Distributed training in practice....574
18.2.3 TPU training....579
18.3 Speeding up training and inference with lower-precision computation....580
18.3.1 Understanding floating-point precision....580
18.3.2 Float16 inference....582
18.3.3 Mixed-precision training....583
18.3.4 Using loss scaling with mixed precision....583
18.3.5 Beyond mixed precision: float8 training....584
18.3.6 Faster inference with quantization....585
19 The future of AI....588
19.1 The limitations of deep learning....588
19.1.1 Deep learning models struggle to adapt to novelty....589
19.1.2 Deep learning models are highly sensitive to phrasing and other distractors....591
19.1.3 Deep learning models struggle to learn generalizable programs....593
19.1.4 The risk of anthropomorphizing machine-learning models....593
19.2 Scale isn’t all you need....594
19.2.1 Automatons vs. intelligent agents....595
19.2.2 Local generalization vs. extreme generalization....597
19.2.3 The purpose of intelligence....599
19.2.4 Climbing the spectrum of generalization....599
19.3 How to build intelligence....600
19.3.1 The kaleidoscope hypothesis....601
19.3.2 The essence of intelligence: Abstraction acquisition and recombination....602
19.3.3 The importance of setting the right target....602
19.3.4 A new target: On-the-fly adaptation....604
19.3.5 ARC Prize....605
19.3.6 The test-time adaptation era....606
19.3.7 ARC-AGI 2....607
19.4 The missing ingredients: Search and symbols....608
19.4.1 The two poles of abstraction....609
19.4.2 Cognition as a combination of both kinds of abstraction....611
19.4.3 Why deep learning isn’t a complete answer to abstraction generation....612
19.4.4 An alternative approach to AI: Program synthesis....613
19.4.5 Blending deep learning and program synthesis....614
19.4.6 Modular component recombination and lifelong learning....616
19.4.7 The long-term vision....618
20 Conclusions....619
20.1 Key concepts in review....619
20.1.1 Various approaches to artificial intelligence....620
20.1.2 What makes deep learning special within the field of machine learning....620
20.1.3 How to think about deep learning....621
20.1.4 Key enabling technologies....622
20.1.5 The universal machine learning workflow....623
20.1.6 Key network architectures....624
20.2 Limitations of deep learning....629
20.3 What might lie ahead....630
20.4 Staying up to date in a fast-moving field....631
20.4.1 Practice on real-world problems using Kaggle....631
20.4.2 Read about the latest developments on arXiv....631
20.4.3 Explore the Keras ecosystem....632
20.5 Final words....632
index....633
Deep Learning with Python, Third Edition puts the power of deep learning in your hands. This new edition includes the latest Keras and TensorFlow features, generative AI models, and added coverage of PyTorch and JAX. Learn directly from the creator of Keras and step confidently into the world of deep learning with Python.
With over 100,000 copies sold, Deep Learning with Python makes it possible for developers, data scientists, and machine learning enthusiasts to put deep learning into action. In this expanded and updated third edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. You'll master state-of-the-art deep learning tools and techniques, from the latest features of Keras 3 to building AI models that can generate text and images.
In less than a decade, deep learning has changed the world—twice. First, Python-based libraries like Keras, TensorFlow, and PyTorch elevated neural networks from lab experiments to high-performance production systems deployed at scale. And now, through Large Language Models and other generative AI tools, deep learning is again transforming business and society. In this new edition, Keras creator François Chollet invites you into this amazing subject in the fluid, mentoring style of a true insider.
Deep Learning with Python, Third Edition makes the concepts behind deep learning and generative AI understandable and approachable. This complete rewrite of the bestselling original includes fresh chapters on transformers, building your own GPT-like LLM, and generating images with diffusion models. Each chapter introduces practical projects and code examples that build your understanding of deep learning, layer by layer.
For readers with intermediate Python skills. No previous experience with machine learning or linear algebra required.