A Mathematical Introduction to Data Science with Python

A Mathematical Introduction to Data Science with Python

A Mathematical Introduction to Data Science with Python
Автор: Adams Rod, Sun Yi
Дата выхода: 2026
Издательство: Springer Nature
Количество страниц: 406
Размер файла: 3,6 МБ
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Preface....5

Objective and Approach....6

Outline of This Book....6

Software Requirements....8

Hardware Used for Timings....8

How to Contact Us....9

Contents....11

1 Introduction....16

1.1 Getting Started with Two Essential Built-in Functions....16

1.2 Data Types in Python....19

1.2.1 Strings....19

1.2.2 Sequence Types....20

1.2.3 Dictionaries....23

1.2.4 Booleans....25

1.3 Python Library Structure Guide....26

1.4 Introduction to NumPy....30

1.5 Introduction to SymPy....36

1.6 Data Manipulation with Pandas....39

1.7 Machine Learning with scikit-learn....42

1.8 Visualisation with Matplotlib....44

1.9 Control Flow in Python....47

1.9.1 If Statement....48

1.9.2 Inline If....49

1.9.3 For Loop....49

1.9.4 While Loop....50

1.10 Define Your Own Functions....51

1.10.1 How to Define A Function?....51

1.10.2 Define Your Own Functions Using Lambda....53

2 Sets and Functions....55

2.1 Sets....55

2.1.1 Set Construction....55

2.1.2 Subsets and Power Sets....59

2.1.3 Set Operations....61

2.1.4 Intervals....64

2.1.5 Sets Written in Comprehension....65

2.1.6 Cartesian Product Sets....67

2.2 Mathematical Functions....68

2.2.1 More About NumPy Functions....70

2.3 Composition of Two Functions....75

3 Linear Algebra....77

3.1 Vectors....77

3.1.1 Creating Vectors....77

3.1.2 Basic Vector Operations....79

3.1.3 Norm and Unit Vector....80

3.1.4 Distance and Angle Between Two Vectors....81

3.2 Linear Combinations and Span....82

3.3 Matrices....85

3.3.1 Build Up Matrices and Access Elements of a Matrix....85

3.3.2 Diagonal and Trace....90

3.4 More Using Matrices in Python....92

3.4.1 Operations with Scalars....92

3.4.2 Operations with Matrices....93

3.5 Matrices as Linear Transformations....100

4 Matrix Decomposition....104

4.1 Eigendecomposition....104

4.2 Understand PCA Through a Simple Example....108

4.3 Principal Component Analysis Using scikit-learn....113

4.4 Understanding SVD Through a Simple Example....114

4.4.1 The Relationship Between PCA and SVD....122

4.5 Compressing an Image Using Singular Vector Decomposition....123

5 Calculus....129

5.1 Finding Limits of Functions....129

5.2 Finding Derivatives of Functions....133

5.2.1 Finding Derivatives of Functions Using SymPy....133

5.2.2 The Chain Rule for Differentiation....138

5.3 Finding Critical Points of A Function with One Variable Using Derivatives....141

5.4 Integrals....145

5.4.1 Definite Integrals and Indefinite Integrals with SymPy....145

5.4.2 An Illustration of the Definite Integral....146

5.4.3 Integration by Substitution....149

5.4.4 Integration by Parts....150

6 Advanced Calculus....153

6.1 Partial Derivatives....153

6.1.1 Jacobian Matrix....154

6.1.2 Hessian Matrix....157

6.2 Find Maxima and Minima for Functions with Two Variables....158

6.3 Method of Lagrange Multipliers for Maxima and Minima....162

6.3.1 An Illustration of the Use of np.meshgrid()....167

6.4 Gradient Descent Algorithm....170

6.5 Double Integrals....174

7 Algorithms 1Principal Component Analysis....179

7.1 Relevant Mathematical Knowledge....179

7.1.1 The Variance of Projections....179

7.1.2 Two Derivatives with Respect to A Vector....182

7.2 The Mathematical Solution Behind the PCA....185

7.3 Data Normalisation....188

8 Algorithms 2Linear Regression....195

8.1 Simple Linear Regression with One Variable....195

8.2 Linear Regression with Multiple Variables....199

8.3 Linear Regression with scikit-learn–LinearRegression....202

8.4 Linear Regression with scikit-learn–SGDRegressor....205

8.4.1 Case Study 1An Overview of the Data and the Key Results....206

8.4.2 The Demonstration of the Use of SGDRegressor....206

9 Algorithms 3Neural Networks....215

9.1 Simple One-Layer Neural Networks....215

9.1.1 Linear Activation Function....216

9.1.2 Logistic Sigmoid Activation Function....221

9.2 An Example of a Simple Two-Layer Neural Network....226

9.3 Neural Network Models with scikit-learn....230

9.3.1 Regression....230

9.3.2 Classification....234

10 Probability....239

10.1 Combinatorial Analysis....239

10.2 Probability....243

10.2.1 Simple Probability....243

10.2.2 Discrete Random Variables....245

10.2.3 Continuous Random Variables....247

10.3 Mean and Variance....250

10.3.1 Mean....250

10.3.2 Variance....254

10.4 Special Univariate Distributions....257

10.4.1 Discrete Uniform Distribution....257

10.4.2 Bernoulli Distribution....260

10.4.3 Binomial Distribution....263

10.4.4 Poisson Distribution....266

10.4.5 Continuous Uniform Distribution....269

10.4.6 Gaussian or Normal Distribution....271

11 Further Probability....278

11.1 The Law of Large Numbers and the Central Limit Theorem....278

11.1.1 The Law of Large Numbers (LLN)....278

11.1.2 Central Limit Theorem....280

11.2 Multiple Random Variables....282

11.2.1 Joint Discrete Random Variables....282

11.2.2 Joint Continuous Random Variables....284

11.2.3 Multinomial Distribution....287

11.2.4 Multivariate Normal Distribution....289

11.3 Conditional Probability and Corresponding Rules....294

11.3.1 Conditional Probability for Two Discrete Random Variables....294

11.3.2 Conditional Probability for Two Continuous Random Variables....299

11.3.3 Conditional Mean for Two Continuous Random Variables....301

11.3.4 The Law of Total Probability....302

11.4 Bayes' Theorem....304

12 Elements of Statistics....307

12.1 Descriptive Statistics....307

12.1.1 Measures of Centre....307

12.1.2 Measures of Variation....313

12.1.3 The Boxplot....315

12.2 Elementary Sampling Theory....318

12.2.1 Sampling Distribution of Means....319

12.2.2 Sampling Distribution of Proportions....320

12.2.3 Student's t-Distribution....322

12.2.4 The Chi-Square Distribution....324

12.3 Interval Estimation....326

12.3.1 Confidence Interval for Means....326

12.3.2 Confidence Interval for Proportions....328

12.3.3 Confidence Interval for chi-squared....330

12.4 Testing Hypothesis....332

12.4.1 The t-test for mean....332

12.4.2 The t-test on Difference between Means of Two Samples....334

12.4.3 The Chi-Square Test for One-way Classification Tables....336

12.4.4 The Chi-Square Test for Two-Way Classification Tables....339

13 Algorithms 4Maximum Likelihood Estimation and Its Application to Regression....342

13.1 Maximum Likelihood Estimation....342

13.2 Revisiting Linear Regression....348

13.2.1 Linear Regression with Maximum Likelihood Estimation....349

13.2.2 Confidence Interval Estimation for MLEs in Linear Regression....351

13.3 The Logistic Regression Algorithm....356

14 Data Modelling in Practice....366

14.1 Data Description and Preprocessing....366

14.1.1 Loading the Dataset....366

14.1.2 Data Preprocessing....368

14.1.3 Dataset Splitting....373

14.2 Feature Selection and Dimensionality Reduction....375

14.2.1 A Simple Feature Selection Method....375

14.2.2 Feature Reduction via Principal Component Analysis....379

14.3 Model Selection....381

14.3.1 Overview of the Model Selection Procedure....381

14.3.2 Model Evaluation via Validation Set....382

14.3.3 Ridge Regression....383

14.3.4 Understanding the Trade-Off Between Bias and Variance....384

14.3.5 Final Model Evaluation....386

14.4 More About Model Evaluation for Classification....389

14.5 Early Stopping....394

Appendix A Setting up Python and Jupyter Notebook....397

A.1 Install Jupyter Notebook Locally....397

A.2 Verifying the Setup....397

Appendix References....399

....399

Index....400

This textbook serves as a companion to "A Mathematical Introduction to Data Science". It uses Python programming to provide a comprehensive foundation in the mathematics needed for data science. It is designed for anyone with a basic mathematical background, including students and self-learners interested in understanding the principles behind the computational algorithms used in data science. The focus of this book is to demonstrate how programming can aid in this understanding and be used in solving mathematical problems. It is written using Python as its programming language, but readers do not need prior knowledge of Python to benefit from it.

Some examples from "A Mathematical Introduction to Data Science" are used to illustrate key concepts such as sets, functions, linear algebra, calculus, and probability and statistics, through Python programming, though it is not necessary to have seen the examples before. Further, this textbook shows how those mathematical concepts can be applied in widely used computational algorithms, such as Principal Component Analysis, Singular Value Decomposition, Linear Regression in two and more dimensions, Simple Neural Networks, Maximum Likelihood Estimation, Logistic Regression and Ridge Regression.

This textbook is designed with the assumption that readers have no prior knowledge of Python but possess a basic understanding of programming concepts, such as control flow. Ideally, readers should have both this book and its companion, "A Mathematical Introduction to Data Science". However, those with a strong mathematical background and an interest in programming implementations can benefit from reading this textbook alone.


Похожее:

Список отзывов:

Нет отзывов к книге.