Think Stats: Exploratory Data Analysis. 3 Ed

Think Stats: Exploratory Data Analysis. 3 Ed

Think Stats: Exploratory Data Analysis. 3 Ed
Автор: Downey Allen
Дата выхода: 2025
Издательство: O’Reilly Media, Inc.
Количество страниц: 324
Размер файла: 3.3 MB
Тип файла: PDF
Добавил: Aleks-5
 Проверить на вирусы

Copyright....4

Table of Contents....5

Preface....9

What’s New?....11

Using the Code....12

Conventions Used in This Book....13

O’Reilly Online Learning....14

How to Contact Us....14

Acknowledgments....14

Chapter 1. Exploratory Data Analysis....15

Evidence....15

The National Survey of Family Growth....17

Reading the Data....18

Validation....21

Transformation....24

Summary Statistics....25

Interpretation....26

Glossary....27

Exercises....29

Exercise 1.1....29

Exercise 1.2....29

Exercise 1.3....29

Chapter 2. Distributions....31

Frequency Tables....31

NSFG Distributions....33

Outliers....37

First Babies....38

Effect Size....40

Reporting Results....42

Glossary....43

Exercises....43

Exercise 2.1....44

Exercise 2.2....44

Exercise 2.3....44

Chapter 3. Probability Mass Functions....45

PMFs....45

Summarizing a PMF....48

The Class Size Paradox....50

NSFG Data....53

Other Visualizations....54

Glossary....55

Exercises....56

Exercise 3.1....56

Exercise 3.2....56

Exercise 3.3....57

Chapter 4. Cumulative Distribution Functions....59

Percentiles and Percentile Ranks....59

CDFs....62

Comparing CDFs....66

Percentile-Based Statistics....68

Random Numbers....72

Glossary....74

Exercises....75

Exercise 4.1....75

Exercise 4.2....75

Exercise 4.3....76

Exercise 4.4....76

Exercise 4.5....76

Chapter 5. Modeling Distributions....77

The Binomial Distribution....77

The Poisson Distribution....82

The Exponential Distribution....86

The Normal Distribution....90

The Lognormal Distribution....93

Why Model?....97

Glossary....98

Exercises....98

Exercise 5.1....98

Exercise 5.2....99

Exercise 5.3....99

Chapter 6. Probability Density Functions....101

Comparing Distributions....101

Probability Density....104

The Exponential PDF....107

Comparing PMFs and PDFs....109

Kernel Density Estimation....111

The Distribution Framework....115

Glossary....120

Exercises....121

Exercise 6.1....121

Exercise 6.2....121

Chapter 7. Relationships Between Variables....123

Scatter Plots....123

Decile Plots....128

Correlation....130

Strength of Correlation....134

Rank Correlation....136

Correlation and Causation....139

Glossary....140

Exercises....141

Exercise 7.1....141

Exercise 7.2....141

Exercise 7.3....142

Exercise 7.4....142

Exercise 7.5....143

Chapter 8. Estimation....145

Weighing Penguins....145

Robustness....149

Estimating Variance....151

Sampling Distributions....152

Standard Error....155

Confidence Intervals....156

Sources of Error....157

Glossary....157

Exercises....159

Exercise 8.1....159

Exercise 8.2....159

Exercise 8.3....159

Exercise 8.4....160

Exercise 8.5....160

Exercise 8.6....161

Chapter 9. Hypothesis Testing....163

Flipping Coins....163

Testing a Difference in Means....166

Other Test Statistics....169

Testing a Correlation....170

Testing Proportions....172

Glossary....176

Exercises....176

Exercise 9.1....176

Exercise 9.2....177

Chapter 10. Least Squares....179

Least Squares Fit....179

Coefficient of Determination....183

Minimizing MSE....185

Estimation....187

Visualizing Uncertainty....189

Transformation....191

Glossary....196

Exercises....196

Exercise 10.1....196

Exercise 10.2....197

Exercise 10.3....197

Chapter 11. Multiple Regression....199

StatsModels....199

On to Multiple Regression....203

Control Variables....205

Nonlinear Relationships....209

Logistic Regression....212

Glossary....216

Exercises....217

Exercise 11.1....217

Exercise 11.2....217

Exercise 11.3....218

Exercise 11.4....218

Chapter 12. Time Series Analysis....219

Electricity....219

Decomposition....220

Prediction....227

Multiplicative Model....231

Autoregression....236

Moving Average....238

Retrodiction with Autoregression....240

ARIMA....242

Prediction with ARIMA....244

Glossary....245

Exercises....246

Exercise 12.1....246

Exercise 12.2....248

Exercise 12.3....249

Chapter 13. Survival Analysis....251

Survival Functions....251

Hazard Function....253

Marriage Data....255

Weighted Bootstrap....258

Estimating Hazard Functions....260

Estimating Survival Functions....263

Lifelines....265

Confidence Intervals....266

Expected Remaining Lifetime....268

Glossary....272

Exercises....272

Exercise 13.1....272

Exercise 13.2....273

Chapter 14. Analytic Methods....275

Normal Probability Plots....275

Normal Distributions....280

Distribution of Sample Means....284

Distribution of Differences....286

Central Limit Theorem....288

The Limits of the Central Limit Theorem....290

Applying the CLT....292

Correlation Test....295

Chi-squared Test....299

Computation and Analysis....302

Glossary....303

Exercises....303

Exercise 14.1....303

Exercise 14.2....304

Exercise 14.3....304

Exercise 14.4....304

Index....307

About the Author....323

Colophon....323

If you know how to program, you have the skills to turn data into knowledge. This thoroughly revised edition presents statistical concepts computationally, rather than mathematically, using programs written in Python. Through practical examples and exercises based on real-world datasets, you'll learn the entire process of exploratory data analysis—from wrangling data and generating statistics to identifying patterns and testing hypotheses.

Whether you're a data scientist, software engineer, or data enthusiast, you'll get up to speed on commonly used tools including NumPy, SciPy, and Pandas. You'll explore distributions, relationships between variables, visualization, and many other concepts. And all chapters are available as Jupyter notebooks, so you can read the text, run the code, and work on exercises all in one place.

  • Analyze data distributions and visualize patterns using Python libraries
  • Improve predictions and insights with regression models
  • Dive into specialized topics like time series analysis and survival analysis
  • Integrate statistical techniques and tools for validation, inference, and more
  • Communicate findings with effective data visualization
  • Troubleshoot common data analysis challenges
  • Boost reproducibility and collaboration in data analysis projects with interactive notebooks

Похожее:

Список отзывов:

Нет отзывов к книге.