Preface....28
Who this book is for....30
What this book covers....31
To get the most out of this book....35
Get in touch....38
Share your thoughts....39
Giving Computers the Ability to Learn from Data....40
Building intelligent machines to transform data into knowledge....41
The three different types of machine learning....42
Making predictions about the future with supervised learning....43
Classification for predicting class labels....44
Regression for predicting continuous outcomes....46
Solving interactive problems with reinforcement learning....48
Discovering hidden structures with unsupervised learning....50
Finding subgroups with clustering....50
Dimensionality reduction for data compression....51
Introduction to the basic terminology and notations....52
Notation and conventions used in this book....53
Machine learning terminology....55
A roadmap for building machine learning systems....56
Preprocessing – getting data into shape....57
Training and selecting a predictive model....58
Evaluating models and predicting unseen data instances....59
Using Python for machine learning....60
Installing Python and packages from the Python Package Index....61
Using the Anaconda Python distribution and package manager....62
Packages for scientific computing, data science, and machine learning....64
Summary....66
Training Simple Machine Learning Algorithms for Classification....68
Artificial neurons – a brief glimpse into the early history of machine learning....68
The formal definition of an artificial neuron....70
The perceptron learning rule....72
Implementing a perceptron learning algorithm in Python....75
An object-oriented perceptron API....76
Training a perceptron model on the Iris dataset....80
Adaptive linear neurons and the convergence of learning....87
Minimizing loss functions with gradient descent....88
Implementing Adaline in Python....91
Improving gradient descent through feature scaling....96
Large-scale machine learning and stochastic gradient descent....98
Summary....104
A Tour of Machine Learning Classifiers Using Scikit-Learn....106
Choosing a classification algorithm....106
First steps with scikit-learn – training a perceptron....108
Modeling class probabilities via logistic regression....115
Logistic regression and conditional probabilities....116
Learning the model weights via the logistic loss function....121
Converting an Adaline implementation into an algorithm for logistic regression....125
Training a logistic regression model with scikit-learn....129
Tackling overfitting via regularization....133
Maximum margin classification with support vector machines....138
Maximum margin intuition....139
Dealing with a nonlinearly separable case using slack variables....140
Alternative implementations in scikit-learn....142
Solving nonlinear problems using a kernel SVM....143
Kernel methods for linearly inseparable data....143
Using the kernel trick to find separating hyperplanes in a high-dimensional space....145
Decision tree learning....149
Maximizing IG – getting the most bang for your buck....151
Building a decision tree....156
Combining multiple decision trees via random forests....159
K-nearest neighbors – a lazy learning algorithm....164
Summary....170
Building Good Training Datasets – Data Preprocessing....172
Dealing with missing data....172
Identifying missing values in tabular data....173
Eliminating training examples or features with missing values....175
Imputing missing values....177
Understanding the scikit-learn estimator API....179
Handling categorical data....181
Categorical data encoding with pandas....182
Mapping ordinal features....182
Encoding class labels....184
Performing one-hot encoding on nominal features....185
Optional: encoding ordinal features....190
Partitioning a dataset into separate training and test datasets....191
Bringing features onto the same scale....196
Selecting meaningful features....200
L1 and L2 regularization as penalties against model complexity....201
A geometric interpretation of L2 regularization....202
Sparse solutions with L1 regularization....205
Sequential feature selection algorithms....210
Assessing feature importance with random forests....219
Summary....222
Compressing Data via Dimensionality Reduction....224
Unsupervised dimensionality reduction via principal component analysis....225
The main steps in principal component analysis....225
Extracting the principal components step by step....229
Total and explained variance....233
Feature transformation....235
Principal component analysis in scikit-learn....240
Assessing feature contributions....244
Supervised data compression via linear discriminant analysis....247
Principal component analysis versus linear discriminant analysis....247
The inner workings of linear discriminant analysis....249
Computing the scatter matrices....250
Selecting linear discriminants for the new feature subspace....254
Projecting examples onto the new feature space....257
LDA via scikit-learn....258
Nonlinear dimensionality reduction and visualization....260
Why consider nonlinear dimensionality reduction?....261
Visualizing data via t-distributed stochastic neighbor embedding....263
Summary....268
Learning Best Practices for Model Evaluation and Hyperparameter Tuning....270
Streamlining workflows with pipelines....270
Loading the Breast Cancer Wisconsin dataset....271
Combining transformers and estimators in a pipeline....274
Using k-fold cross-validation to assess model performance....277
The holdout method....277
K-fold cross-validation....279
Debugging algorithms with learning and validation curves....286
Diagnosing bias and variance problems with learning curves....286
Addressing over- and underfitting with validation curves....290
Fine-tuning machine learning models via grid search....293
Tuning hyperparameters via grid search....293
Exploring hyperparameter configurations more widely with randomized search....296
More resource-efficient hyperparameter search with successive halving....299
Algorithm selection with nested cross-validation....303
Looking at different performance evaluation metrics....306
Reading a confusion matrix....306
Optimizing the precision and recall of a classification model....309
Plotting a receiver operating characteristic....313
Scoring metrics for multiclass classification....317
Dealing with class imbalance....318
Summary....323
Combining Different Models for Ensemble Learning....325
Learning with ensembles....325
Combining classifiers via majority vote....331
Implementing a simple majority vote classifier....331
Using the majority voting principle to make predictions....338
Evaluating and tuning the ensemble classifier....342
Bagging – building an ensemble of classifiers from bootstrap samples....350
Bagging in a nutshell....351
Applying bagging to classify examples in the Wine dataset....353
Leveraging weak learners via adaptive boosting....358
How adaptive boosting works....359
Applying AdaBoost using scikit-learn....366
Gradient boosting – training an ensemble based on loss gradients....370
Comparing AdaBoost with gradient boosting....371
Outlining the general gradient boosting algorithm....372
Explaining the gradient boosting algorithm for classification....375
Illustrating gradient boosting for classification....378
Using XGBoost....381
Summary....384
Applying Machine Learning to Sentiment Analysis....386
Preparing the IMDb movie review data for text processing....386
Obtaining the movie review dataset....387
Preprocessing the movie dataset into a more convenient format....388
Introducing the bag-of-words model....390
Transforming words into feature vectors....391
Assessing word relevancy via term frequency-inverse document frequency....394
Cleaning text data....397
Processing documents into tokens....400
Training a logistic regression model for document classification....403
Working with bigger data – online algorithms and out-of-core learning....407
Topic modeling with latent Dirichlet allocation....412
Decomposing text documents with LDA....413
LDA with scikit-learn....414
Summary....418
Predicting Continuous Target Variables with Regression Analysis....420
Introducing linear regression....421
Simple linear regression....421
Multiple linear regression....422
Exploring the Ames Housing dataset....424
Loading the Ames Housing dataset into a DataFrame....424
Visualizing the important characteristics of a dataset....428
Looking at relationships using a correlation matrix....430
Implementing an ordinary least squares linear regression model....433
Solving regression for regression parameters with gradient descent....434
Estimating the coefficient of a regression model via scikit-learn....439
Fitting a robust regression model using RANSAC....443
Evaluating the performance of linear regression models....447
Using regularized methods for regression....454
Turning a linear regression model into a curve – polynomial regression....457
Adding polynomial terms using scikit-learn....457
Modeling nonlinear relationships in the Ames Housing dataset....460
Dealing with nonlinear relationships using random forests....463
Decision tree regression....464
Random forest regression....467
Summary....470
Working with Unlabeled Data – Clustering Analysis....472
Grouping objects by similarity using k-means....472
k-means clustering using scikit-learn....473
A smarter way of placing the initial cluster centroids using k-means++....479
Hard versus soft clustering....481
Using the elbow method to find the optimal number of clusters....483
Quantifying the quality of clustering via silhouette plots....485
Organizing clusters as a hierarchical tree....490
Grouping clusters in a bottom-up fashion....491
Performing hierarchical clustering on a distance matrix....493
Attaching dendrograms to a heat map....498
Applying agglomerative clustering via scikit-learn....500
Locating regions of high density via DBSCAN....501
Summary....508
Implementing a Multilayer Artificial Neural Network from Scratch....511
Modeling complex functions with artificial neural networks....511
Single-layer neural network recap....513
Introducing the multilayer neural network architecture....516
Activating a neural network via forward propagation....519
Classifying handwritten digits....522
Obtaining and preparing the MNIST dataset....523
Implementing a multilayer perceptron....527
Coding the neural network training loop....533
Evaluating the neural network performance....539
Training an artificial neural network....544
Computing the loss function....545
Developing your understanding of backpropagation....547
Training neural networks via backpropagation....549
About convergence in neural networks....554
A few last words about the neural network implementation....556
Summary....557
Parallelizing Neural Network Training with PyTorch....559
PyTorch and training performance....560
Performance challenges....560
What is PyTorch?....562
How we will learn PyTorch....564
First steps with PyTorch....565
Installing PyTorch....565
Creating tensors in PyTorch....567
Manipulating the data type and shape of a tensor....568
Applying mathematical operations to tensors....570
Split, stack, and concatenate tensors....572
Building input pipelines in PyTorch....575
Creating a PyTorch DataLoader from existing tensors....576
Combining two tensors into a joint dataset....577
Shuffle, batch, and repeat....579
Creating a dataset from files on your local storage disk....581
Fetching available datasets from the torchvision.datasets library....586
Building an NN model in PyTorch....592
The PyTorch neural network module (torch.nn)....593
Building a linear regression model....594
Model training via the torch.nn and torch.optim modules....598
Building a multilayer perceptron for classifying flowers in the Iris dataset....600
Evaluating the trained model on the test dataset....604
Saving and reloading the trained model....605
Choosing activation functions for multilayer neural networks....606
Logistic function recap....608
Estimating class probabilities in multiclass classification via the softmax function....610
Broadening the output spectrum using a hyperbolic tangent....612
Rectified linear unit activation....615
Summary....617
Going Deeper – The Mechanics of PyTorch....619
The key features of PyTorch....620
PyTorch’s computation graphs....621
Understanding computation graphs....622
Creating a graph in PyTorch....623
PyTorch tensor objects for storing and updating model parameters....624
Computing gradients via automatic differentiation....628
Computing the gradients of the loss with respect to trainable variables....628
Understanding automatic differentiation....631
Adversarial examples....631
Simplifying implementations of common architectures via the torch.nn module....632
Implementing models based on nn.Sequential....632
Choosing a loss function....634
Solving an XOR classification problem....636
Making model building more flexible with nn.Module....642
Writing custom layers in PyTorch....645
Project one – predicting the fuel efficiency of a car....650
Working with feature columns....651
Training a DNN regression model....656
Project two – classifying MNIST handwritten digits....659
Higher-level PyTorch APIs: a short introduction to PyTorch-Lightning....663
Setting up the PyTorch Lightning model....665
Setting up the data loaders for Lightning....668
Training the model using the PyTorch Lightning Trainer class....670
Evaluating the model using TensorBoard....671
Summary....677
Classifying Images with Deep Convolutional Neural Networks....679
The building blocks of CNNs....679
Understanding CNNs and feature hierarchies....680
Performing discrete convolutions....683
Discrete convolutions in one dimension....683
Padding inputs to control the size of the output feature maps....686
Determining the size of the convolution output....688
Performing a discrete convolution in 2D....689
Subsampling layers....693
Putting everything together – implementing a CNN....695
Working with multiple input or color channels....696
Regularizing an NN with L2 regularization and dropout....700
Loss functions for classification....704
Implementing a deep CNN using PyTorch....707
The multilayer CNN architecture....707
Loading and preprocessing the data....708
Implementing a CNN using the torch.nn module....709
Configuring CNN layers in PyTorch....710
Constructing a CNN in PyTorch....711
Smile classification from face images using a CNN....717
Loading the CelebA dataset....717
Image transformation and data augmentation....719
Training a CNN smile classifier....726
Summary....732
Modeling Sequential Data Using Recurrent Neural Networks....734
Introducing sequential data....735
Modeling sequential data – order matters....735
Sequential data versus time series data....736
Representing sequences....737
The different categories of sequence modeling....738
RNNs for modeling sequences....740
Understanding the dataflow in RNNs....740
Computing activations in an RNN....744
Hidden recurrence versus output recurrence....747
The challenges of learning long-range interactions....751
Long short-term memory cells....753
Implementing RNNs for sequence modeling in PyTorch....756
Project one – predicting the sentiment of IMDb movie reviews....757
Preparing the movie review data....757
Embedding layers for sentence encoding....764
Building an RNN model....767
Building an RNN model for the sentiment analysis task....769
Project two – character-level language modeling in PyTorch....775
Preprocessing the dataset....776
Building a character-level RNN model....782
Evaluation phase – generating new text passages....785
Summary....791
Transformers – Improving Natural Language Processing with Attention Mechanisms....793
Adding an attention mechanism to RNNs....794
Attention helps RNNs with accessing information....795
The original attention mechanism for RNNs....796
Processing the inputs using a bidirectional RNN....798
Generating outputs from context vectors....799
Computing the attention weights....800
Introducing the self-attention mechanism....802
Starting with a basic form of self-attention....803
Parameterizing the self-attention mechanism: scaled dot-product attention....809
Attention is all we need: introducing the original transformer architecture....813
Encoding context embeddings via multi-head attention....815
Learning a language model: decoder and masked multi-head attention....823
Implementation details: positional encodings and layer normalization....825
Building large-scale language models by leveraging unlabeled data....828
Pre-training and fine-tuning transformer models....828
Leveraging unlabeled data with GPT....832
Using GPT-2 to generate new text....839
Bidirectional pre-training with BERT....843
The best of both worlds: BART....849
Fine-tuning a BERT model in PyTorch....853
Loading the IMDb movie review dataset....854
Tokenizing the dataset....857
Loading and fine-tuning a pre-trained BERT model....859
Fine-tuning a transformer more conveniently using the Trainer API....864
Summary....869
Generative Adversarial Networks for Synthesizing New Data....872
Introducing generative adversarial networks....872
Starting with autoencoders....874
Generative models for synthesizing new data....877
Generating new samples with GANs....879
Understanding the loss functions of the generator and discriminator networks in a GAN model....881
Implementing a GAN from scratch....884
Training GAN models on Google Colab....884
Implementing the generator and the discriminator networks....888
Defining the training dataset....892
Training the GAN model....895
Improving the quality of synthesized images using a convolutional and Wasserstein GAN....902
Transposed convolution....903
Batch normalization....905
Implementing the generator and discriminator....908
Dissimilarity measures between two distributions....916
Using EM distance in practice for GANs....921
Gradient penalty....922
Implementing WGAN-GP to train the DCGAN model....923
Mode collapse....928
Other GAN applications....930
Summary....931
Graph Neural Networks for Capturing Dependencies in Graph Structured Data....933
Introduction to graph data....934
Undirected graphs....935
Directed graphs....936
Labeled graphs....937
Representing molecules as graphs....938
Understanding graph convolutions....939
The motivation behind using graph convolutions....939
Implementing a basic graph convolution....943
Implementing a GNN in PyTorch from scratch....948
Defining the NodeNetwork model....949
Coding the NodeNetwork’s graph convolution layer....951
Adding a global pooling layer to deal with varying graph sizes....952
Preparing the DataLoader....956
Using the NodeNetwork to make predictions....959
Implementing a GNN using the PyTorch Geometric library....961
Other GNN layers and recent developments....969
Spectral graph convolutions....970
Pooling....973
Normalization....975
Pointers to advanced graph neural network literature....978
Summary....980
Reinforcement Learning for Decision Making in Complex Environments....983
Introduction – learning from experience....984
Understanding reinforcement learning....984
Defining the agent-environment interface of a reinforcement learning system....987
The theoretical foundations of RL....989
Markov decision processes....989
The mathematical formulation of Markov decision processes....991
Visualization of a Markov process....993
Episodic versus continuing tasks....994
RL terminology: return, policy, and value function....995
The return....995
Policy....998
Value function....998
Dynamic programming using the Bellman equation....1001
Reinforcement learning algorithms....1002
Dynamic programming....1003
Policy evaluation – predicting the value function with dynamic programming....1004
Improving the policy using the estimated value function....1005
Policy iteration....1006
Value iteration....1007
Reinforcement learning with Monte Carlo....1008
State-value function estimation using MC....1009
Action-value function estimation using MC....1009
Finding an optimal policy using MC control....1010
Policy improvement – computing the greedy policy from the action-value function....1010
Temporal difference learning....1011
TD prediction....1011
On-policy TD control (SARSA)....1013
Off-policy TD control (Q-learning)....1014
Implementing our first RL algorithm....1015
Introducing the OpenAI Gym toolkit....1015
Working with the existing environments in OpenAI Gym....1016
A grid world example....1018
Implementing the grid world environment in OpenAI Gym....1019
Solving the grid world problem with Q-learning....1027
A glance at deep Q-learning....1031
Training a DQN model according to the Q-learning algorithm....1033
Replay memory....1033
Determining the target values for computing the loss....1035
Implementing a deep Q-learning algorithm....1036
Chapter and book summary....1041
Other Books You May Enjoy....1047
Index....1052
Machine Learning with PyTorch and Scikit-Learn is a comprehensive guide to machine learning and deep learning with PyTorch. It acts as both a step-by-step tutorial and a reference you'll keep coming back to as you build your machine learning systems.
Packed with clear explanations, visualizations, and examples, the book covers all the essential machine learning techniques in depth. While some books teach you only to follow instructions, with this machine learning book, we teach the principles allowing you to build models and applications for yourself.
PyTorch is the Pythonic way to learn machine learning, making it easier to learn and simpler to code with. This book explains the essential parts of PyTorch and how to create models using popular libraries, such as PyTorch Lightning and PyTorch Geometric.
You will also learn about generative adversarial networks (GANs) for generating new data and training intelligent agents with reinforcement learning. Finally, this new edition is expanded to cover the latest trends in deep learning, including graph neural networks and large-scale transformers used for natural language processing (NLP).
This PyTorch book is your companion to machine learning with Python, whether you're a Python developer new to machine learning or want to deepen your knowledge of the latest developments.
If you have a good grasp of Python basics and want to start learning about machine learning and deep learning, then this is the book for you. This is an essential resource written for developers and data scientists who want to create practical machine learning and deep learning applications using scikit-learn and PyTorch.
Before you get started with this book, you'll need a good understanding of calculus, as well as linear algebra.