Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics

Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics

Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics
Автор: Nield Thomas
Дата выхода: 2022
Издательство: O’Reilly Media, Inc.
Количество страниц: 511
Размер файла: 4.7 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Preface....6

Conventions Used in This Book....9

Using Code Examples....10

O’Reilly Online Learning....11

How to Contact Us....11

Acknowledgments....12

1. Basic Math and Calculus Review....14

Number Theory....14

Order of Operations....17

Variables....19

Functions....20

Summations....30

Exponents....33

Logarithms....37

Euler’s Number and Natural Logarithms....41

Euler’s Number....41

Natural Logarithms....46

Limits....46

Derivatives....49

Partial Derivatives....55

The Chain Rule....59

Integrals....62

Conclusion....71

Exercises....71

2. Probability....73

Understanding Probability....73

Probability Versus Statistics....75

Probability Math....76

Joint Probabilities....76

Union Probabilities....77

Conditional Probability and Bayes’ Theorem....79

Joint and Union Conditional Probabilities....82

Binomial Distribution....84

Beta Distribution....87

Conclusion....100

Exercises....100

3. Descriptive and Inferential Statistics....102

What Is Data?....102

Descriptive Versus Inferential Statistics....105

Populations, Samples, and Bias....106

Descriptive Statistics....111

Mean and Weighted Mean....111

Median....113

Mode....114

Variance and Standard Deviation....115

The Normal Distribution....122

The Inverse CDF....135

Z-Scores....137

Inferential Statistics....140

The Central Limit Theorem....140

Confidence Intervals....144

Understanding P-Values....149

Hypothesis Testing....150

The T-Distribution: Dealing with Small Samples....163

Big Data Considerations and the Texas Sharpshooter Fallacy....166

Conclusion....167

Exercises....168

4. Linear Algebra....169

What Is a Vector?....169

Adding and Combining Vectors....177

Scaling Vectors....181

Span and Linear Dependence....186

Linear Transformations....191

Basis Vectors....192

Matrix Vector Multiplication....200

Matrix Multiplication....208

Determinants....211

Special Types of Matrices....217

Square Matrix....217

Identity Matrix....217

Inverse Matrix....218

Diagonal Matrix....219

Triangular Matrix....219

Sparse Matrix....219

Systems of Equations and Inverse Matrices....220

Eigenvectors and Eigenvalues....225

Conclusion....229

Exercises....230

5. Linear Regression....232

A Basic Linear Regression....234

Residuals and Squared Errors....243

Finding the Best Fit Line....248

Closed Form Equation....249

Inverse Matrix Techniques....251

Gradient Descent....254

Overfitting and Variance....262

Stochastic Gradient Descent....266

The Correlation Coefficient....268

Statistical Significance....273

Coefficient of Determination....282

Standard Error of the Estimate....283

Prediction Intervals....285

Train/Test Splits....290

Multiple Linear Regression....300

Conclusion....301

Exercises....302

6. Logistic Regression and Classification....303

Understanding Logistic Regression....303

Performing a Logistic Regression....307

Logistic Function....307

Fitting the Logistic Curve....309

Multivariable Logistic Regression....315

Understanding the Log-Odds....320

R-Squared....324

P-Values....329

Train/Test Splits....330

Confusion Matrices....332

Bayes’ Theorem and Classification....336

Receiver Operator Characteristics/Area Under Curve....337

Class Imbalance....340

Conclusion....340

Exercises....341

7. Neural Networks....342

When to Use Neural Networks and Deep Learning....342

A Simple Neural Network....343

Activation Functions....347

Forward Propagation....356

Backpropagation....363

Calculating the Weight and Bias Derivatives....363

Stochastic Gradient Descent....368

Using scikit-learn....371

Limitations of Neural Networks and Deep Learning....372

Conclusion....375

Exercise....376

8. Career Advice and the Path Forward....377

Redefining Data Science....378

A Brief History of Data Science....381

Finding Your Edge....385

SQL Proficiency....385

Programming Proficiency....388

Data Visualization....393

Knowing Your Industry....395

Productive Learning....396

Practitioner Versus Advisor....397

What to Watch Out For in Data Science Jobs....400

Role Definition....401

Organizational Focus and Buy-In....402

Adequate Resources....404

Reasonable Objectives....405

Competing with Existing Systems....407

A Role Is Not What You Expected....409

Does Your Dream Job Not Exist?....412

Where Do I Go Now?....412

Conclusion....414

A. Supplemental Topics....416

Using LaTeX Rendering with SymPy....416

Binomial Distribution from Scratch....418

Beta Distribution from Scratch....419

Deriving Bayes’ Theorem....421

CDF and Inverse CDF from Scratch....423

Use e to Predict Event Probability Over Time....425

Hill Climbing and Linear Regression....427

Hill Climbing and Logistic Regression....430

A Brief Intro to Linear Programming....431

MNIST Classifier Using scikit-learn....439

B. Exercise Answers....441

Chapter 1....441

Chapter 2....444

Chapter 3....446

Chapter 4....449

Chapter 5....453

Chapter 6....458

Chapter 7....462

Index....465

About the Author....510

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career.

Learn how to:

  • Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning
  • Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon
  • Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance
  • Manipulate vectors and matrices and perform matrix decomposition
  • Integrate and build upon incremental knowledge of calculus, probability, statistics, and linear algebra, and apply it to regression models including neural networks
  • Navigate practically through a data science career and avoid common pitfalls, assumptions, and biases while tuning your skill set to stand out in the job market

Похожее:

Список отзывов:

Нет отзывов к книге.