Preface....5
Objective and Approach....6
Outline of This Book....6
Software Requirements....8
Hardware Used for Timings....8
How to Contact Us....9
Contents....11
1 Introduction....16
1.1 Getting Started with Two Essential Built-in Functions....16
1.2 Data Types in Python....19
1.2.1 Strings....19
1.2.2 Sequence Types....20
1.2.3 Dictionaries....23
1.2.4 Booleans....25
1.3 Python Library Structure Guide....26
1.4 Introduction to NumPy....30
1.5 Introduction to SymPy....36
1.6 Data Manipulation with Pandas....39
1.7 Machine Learning with scikit-learn....42
1.8 Visualisation with Matplotlib....44
1.9 Control Flow in Python....47
1.9.1 If Statement....48
1.9.2 Inline If....49
1.9.3 For Loop....49
1.9.4 While Loop....50
1.10 Define Your Own Functions....51
1.10.1 How to Define A Function?....51
1.10.2 Define Your Own Functions Using Lambda....53
2 Sets and Functions....55
2.1 Sets....55
2.1.1 Set Construction....55
2.1.2 Subsets and Power Sets....59
2.1.3 Set Operations....61
2.1.4 Intervals....64
2.1.5 Sets Written in Comprehension....65
2.1.6 Cartesian Product Sets....67
2.2 Mathematical Functions....68
2.2.1 More About NumPy Functions....70
2.3 Composition of Two Functions....75
3 Linear Algebra....77
3.1 Vectors....77
3.1.1 Creating Vectors....77
3.1.2 Basic Vector Operations....79
3.1.3 Norm and Unit Vector....80
3.1.4 Distance and Angle Between Two Vectors....81
3.2 Linear Combinations and Span....82
3.3 Matrices....85
3.3.1 Build Up Matrices and Access Elements of a Matrix....85
3.3.2 Diagonal and Trace....90
3.4 More Using Matrices in Python....92
3.4.1 Operations with Scalars....92
3.4.2 Operations with Matrices....93
3.5 Matrices as Linear Transformations....100
4 Matrix Decomposition....104
4.1 Eigendecomposition....104
4.2 Understand PCA Through a Simple Example....108
4.3 Principal Component Analysis Using scikit-learn....113
4.4 Understanding SVD Through a Simple Example....114
4.4.1 The Relationship Between PCA and SVD....122
4.5 Compressing an Image Using Singular Vector Decomposition....123
5 Calculus....129
5.1 Finding Limits of Functions....129
5.2 Finding Derivatives of Functions....133
5.2.1 Finding Derivatives of Functions Using SymPy....133
5.2.2 The Chain Rule for Differentiation....138
5.3 Finding Critical Points of A Function with One Variable Using Derivatives....141
5.4 Integrals....145
5.4.1 Definite Integrals and Indefinite Integrals with SymPy....145
5.4.2 An Illustration of the Definite Integral....146
5.4.3 Integration by Substitution....149
5.4.4 Integration by Parts....150
6 Advanced Calculus....153
6.1 Partial Derivatives....153
6.1.1 Jacobian Matrix....154
6.1.2 Hessian Matrix....157
6.2 Find Maxima and Minima for Functions with Two Variables....158
6.3 Method of Lagrange Multipliers for Maxima and Minima....162
6.3.1 An Illustration of the Use of np.meshgrid()....167
6.4 Gradient Descent Algorithm....170
6.5 Double Integrals....174
7 Algorithms 1Principal Component Analysis....179
7.1 Relevant Mathematical Knowledge....179
7.1.1 The Variance of Projections....179
7.1.2 Two Derivatives with Respect to A Vector....182
7.2 The Mathematical Solution Behind the PCA....185
7.3 Data Normalisation....188
8 Algorithms 2Linear Regression....195
8.1 Simple Linear Regression with One Variable....195
8.2 Linear Regression with Multiple Variables....199
8.3 Linear Regression with scikit-learn–LinearRegression....202
8.4 Linear Regression with scikit-learn–SGDRegressor....205
8.4.1 Case Study 1An Overview of the Data and the Key Results....206
8.4.2 The Demonstration of the Use of SGDRegressor....206
9 Algorithms 3Neural Networks....215
9.1 Simple One-Layer Neural Networks....215
9.1.1 Linear Activation Function....216
9.1.2 Logistic Sigmoid Activation Function....221
9.2 An Example of a Simple Two-Layer Neural Network....226
9.3 Neural Network Models with scikit-learn....230
9.3.1 Regression....230
9.3.2 Classification....234
10 Probability....239
10.1 Combinatorial Analysis....239
10.2 Probability....243
10.2.1 Simple Probability....243
10.2.2 Discrete Random Variables....245
10.2.3 Continuous Random Variables....247
10.3 Mean and Variance....250
10.3.1 Mean....250
10.3.2 Variance....254
10.4 Special Univariate Distributions....257
10.4.1 Discrete Uniform Distribution....257
10.4.2 Bernoulli Distribution....260
10.4.3 Binomial Distribution....263
10.4.4 Poisson Distribution....266
10.4.5 Continuous Uniform Distribution....269
10.4.6 Gaussian or Normal Distribution....271
11 Further Probability....278
11.1 The Law of Large Numbers and the Central Limit Theorem....278
11.1.1 The Law of Large Numbers (LLN)....278
11.1.2 Central Limit Theorem....280
11.2 Multiple Random Variables....282
11.2.1 Joint Discrete Random Variables....282
11.2.2 Joint Continuous Random Variables....284
11.2.3 Multinomial Distribution....287
11.2.4 Multivariate Normal Distribution....289
11.3 Conditional Probability and Corresponding Rules....294
11.3.1 Conditional Probability for Two Discrete Random Variables....294
11.3.2 Conditional Probability for Two Continuous Random Variables....299
11.3.3 Conditional Mean for Two Continuous Random Variables....301
11.3.4 The Law of Total Probability....302
11.4 Bayes' Theorem....304
12 Elements of Statistics....307
12.1 Descriptive Statistics....307
12.1.1 Measures of Centre....307
12.1.2 Measures of Variation....313
12.1.3 The Boxplot....315
12.2 Elementary Sampling Theory....318
12.2.1 Sampling Distribution of Means....319
12.2.2 Sampling Distribution of Proportions....320
12.2.3 Student's t-Distribution....322
12.2.4 The Chi-Square Distribution....324
12.3 Interval Estimation....326
12.3.1 Confidence Interval for Means....326
12.3.2 Confidence Interval for Proportions....328
12.3.3 Confidence Interval for chi-squared....330
12.4 Testing Hypothesis....332
12.4.1 The t-test for mean....332
12.4.2 The t-test on Difference between Means of Two Samples....334
12.4.3 The Chi-Square Test for One-way Classification Tables....336
12.4.4 The Chi-Square Test for Two-Way Classification Tables....339
13 Algorithms 4Maximum Likelihood Estimation and Its Application to Regression....342
13.1 Maximum Likelihood Estimation....342
13.2 Revisiting Linear Regression....348
13.2.1 Linear Regression with Maximum Likelihood Estimation....349
13.2.2 Confidence Interval Estimation for MLEs in Linear Regression....351
13.3 The Logistic Regression Algorithm....356
14 Data Modelling in Practice....366
14.1 Data Description and Preprocessing....366
14.1.1 Loading the Dataset....366
14.1.2 Data Preprocessing....368
14.1.3 Dataset Splitting....373
14.2 Feature Selection and Dimensionality Reduction....375
14.2.1 A Simple Feature Selection Method....375
14.2.2 Feature Reduction via Principal Component Analysis....379
14.3 Model Selection....381
14.3.1 Overview of the Model Selection Procedure....381
14.3.2 Model Evaluation via Validation Set....382
14.3.3 Ridge Regression....383
14.3.4 Understanding the Trade-Off Between Bias and Variance....384
14.3.5 Final Model Evaluation....386
14.4 More About Model Evaluation for Classification....389
14.5 Early Stopping....394
Appendix A Setting up Python and Jupyter Notebook....397
A.1 Install Jupyter Notebook Locally....397
A.2 Verifying the Setup....397
Appendix References....399
....399
Index....400
This textbook serves as a companion to "A Mathematical Introduction to Data Science". It uses Python programming to provide a comprehensive foundation in the mathematics needed for data science. It is designed for anyone with a basic mathematical background, including students and self-learners interested in understanding the principles behind the computational algorithms used in data science. The focus of this book is to demonstrate how programming can aid in this understanding and be used in solving mathematical problems. It is written using Python as its programming language, but readers do not need prior knowledge of Python to benefit from it.
Some examples from "A Mathematical Introduction to Data Science" are used to illustrate key concepts such as sets, functions, linear algebra, calculus, and probability and statistics, through Python programming, though it is not necessary to have seen the examples before. Further, this textbook shows how those mathematical concepts can be applied in widely used computational algorithms, such as Principal Component Analysis, Singular Value Decomposition, Linear Regression in two and more dimensions, Simple Neural Networks, Maximum Likelihood Estimation, Logistic Regression and Ridge Regression.
This textbook is designed with the assumption that readers have no prior knowledge of Python but possess a basic understanding of programming concepts, such as control flow. Ideally, readers should have both this book and its companion, "A Mathematical Introduction to Data Science". However, those with a strong mathematical background and an interest in programming implementations can benefit from reading this textbook alone.