Foundations of Data Science with Python

Foundations of Data Science with Python

Foundations of Data Science with Python
Автор: Shea John M.
Дата выхода: 2024
Издательство: CRC Press is an imprint of Taylor & Francis Group, LLC
Количество страниц: 503
Размер файла: 11.7 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы  Дополнительные материалы 

Cover....1

Half Title....2

Series Page....3

Title Page....4

Copyright Page....5

Dedication....6

Contents....8

Acknowledgments....12

Preface....14

1. Introduction....16

1.1. Who is this book for?....16

1.2. Why learn data science from this book?....16

1.3. What is data science?....17

1.4. What data science topics does this book cover?....20

1.5. What data science topics does this book not cover?....21

1.6. Extremely Brief Introduction to Jupyter and Python....21

1.7. Chapter Summary....31

2. First Simulations, Visualizations, and Statistical Tests....32

2.1. Motivating Problem: Is This Coin Fair?....32

2.2. First Computer Simulations....33

2.3. First Visualizations: Scatter Plots and Histograms....35

2.4. First Statistical Tests....42

2.5. Chapter Summary....46

3. First Visualizations and Statistical Tests with Real Data....48

3.1. Introduction to Pandas....49

3.2. Visualizing Multiple Data Sets – Part 1: Scatter Plots....54

3.3. Partitions....62

3.4. Summary Statistics....63

3.5. Visualizing Multiple Data Sets – Part 2: Histograms for Partitioned Data....71

3.6. Null Hypothesis Testing with Real Data....76

3.7. A Quick Preview of Two-Dimensional Statistical Methods....88

3.8. Chapter Summary....91

4. Introduction to Probability....92

4.1. Outcomes, Sample Spaces, and Events....92

4.2. Relative Frequencies and Probabilities....93

4.3. Fair Experiments....98

4.4. Axiomatic Probability....101

4.5. Corollaries to the Axioms of Probability....109

4.6. Combinatorics....114

4.7. Chapter Summary....127

5. Null Hypothesis Tests....128

5.1. Statistical Studies....128

5.2. General Resampling Approaches for Null Hypothesis Significance Testing....135

5.3. Calculating p-Values....142

5.4. How to Sample from the Pooled Data....147

5.5. Example Null Hypothesis Significance Tests....153

5.6. Bootstrap Distribution and Confidence Intervals....156

5.7. Types of Errors and Statistical Power....162

5.8. Chapter Summary....165

6. Conditional Probability, Dependence, and Independence....166

6.1. Simulating and Counting Conditional Probabilities....167

6.2. Conditional Probability: Notation and Intuition....172

6.3. Formally Defining Conditional Probability....174

6.4. Relating Conditional and Unconditional Probabilities....177

6.5. More on Simulating Conditional Probabilities....179

6.6. Statistical Independence....181

6.7. Conditional Probabilities and Independence in Fair Experiments....187

6.8. Conditioning and (In)dependence....191

6.9. Chain Rules and Total Probability....193

6.10. Chapter Summary....201

7. Introduction to Bayesian Methods....202

7.1. Bayes’ Rule....202

7.2. Bayes’ Rule in Systems with Hidden State....210

7.3. Optimal Decisions for Discrete Stochastic Systems....212

7.4. Bayesian Hypothesis Testing....221

7.5. Chapter Summary....230

8. Random Variables....232

8.1. Definition of a Real Random Variable....232

8.2. Discrete Random Variables....244

8.3. Cumulative Distribution Functions....252

8.4. Important Discrete RVs....262

8.5. Continuous Random Variables....280

8.6. Important Continuous Random Variables....294

8.7. Histograms of Continuous Random Variables and Kernel Density Estimation....314

8.8. Conditioning with Random Variables....316

8.9. Chapter Summary....320

9. Expected Value, Parameter Estimation, and Hypothesis Tests on Sample Means....321

9.1. Expected Value....321

9.2. Expected Value of a Continuous Random Variable with SymPy....327

9.3. Moments....330

9.4. Parameter Estimation....341

9.5. Confidence Intervals for Estimates....348

9.6. Testing a Difference of Means....357

9.7. Sampling and Bootstrap Distributions of Parameters....371

9.8. Effect Size, Power, and Sample Size Selection....375

9.9. Chapter Summary....377

10. Decision-Making with Observations from Continuous Distributions....379

10.1. Binary Decisions from Continuous Data: Non-Bayesian Approaches....379

10.2. Point Conditioning....390

10.3. Optimal Bayesian Decision-Making with Continuous Random Variables....395

10.4. Chapter Summary....401

11. Categorical Data, Tests for Dependence, and Goodness of Fit for Discrete Distributions....402

11.1. Tabulating Categorical Data and Creating a Test Statistic....403

11.2. Null Hypothesis Significance Testing for Dependence in Contingency Tables....409

11.3. Chi-Square Goodness-of-Fit Test....415

11.4. Chapter Summary....423

12. Multidimensional Data: Vector Moments and Linear Regression....424

12.1. Summary Statistics for Vector Data....425

12.2. Linear Regression....437

12.3. Null Hypothesis Tests for Correlation....450

12.4. Nonlinear Regression Tests....452

12.5. Chapter Summary....460

13. Working with Dependent Data in Multiple Dimensions....461

13.1. Jointly Distributed Pairs of Random Variables....461

13.2. Standardization and Linear Transformations....471

13.3. Decorrelating Random Vectors and Multi-Dimensional Data....482

13.4. Principal Components Analysis....490

13.5. Chapter Summary....499

Index....500

Foundations of Data Science with Python introduces readers to the fundamentals of data science, including data manipulation and visualization, probability, statistics, and dimensionality reduction. This book is targeted toward engineers and scientists, but it should be readily understandable to anyone who knows basic calculus and the essentials of computer programming. It uses a computational-first approach to data science: the reader will learn how to use Python and the associated data-science libraries to visualize, transform, and model data, as well as how to conduct statistical tests using real data sets. Rather than relying on obscure formulas that only apply to very specific statistical tests, this book teaches readers how to perform statistical tests via resampling; this is a simple and general approach to conducting statistical tests using simulations that draw samples from the data being analyzed. The statistical techniques and tools are explained and demonstrated using a diverse collection of data sets to conduct statistical tests related to contemporary topics, from the effects of socioeconomic factors on the spread of the COVID-19 virus to the impact of state laws on firearms mortality.

This book can be used as an undergraduate textbook for an Introduction to Data Science course or to provide a more contemporary approach in courses like Engineering Statistics. However, it is also intended to be accessible to practicing engineers and scientists who need to gain foundational knowledge of data science.

Key Features:

  • Applies a modern, computational approach to working with data
  • Uses real data sets to conduct statistical tests that address a diverse set of contemporary issues
  • Teaches the fundamentals of some of the most important tools in the Python data-science stack
  • Provides a basic, but rigorous, introduction to Probability and its application to Statistics
  • Offers an accompanying website that provides a unique set of online, interactive tools to help the reader learn the material

Похожее:

Список отзывов:

Нет отзывов к книге.