Copyright....4
Table of Contents....5
Preface....9
What’s New?....11
Using the Code....12
Conventions Used in This Book....13
O’Reilly Online Learning....14
How to Contact Us....14
Acknowledgments....14
Chapter 1. Exploratory Data Analysis....15
Evidence....15
The National Survey of Family Growth....17
Reading the Data....18
Validation....21
Transformation....24
Summary Statistics....25
Interpretation....26
Glossary....27
Exercises....29
Exercise 1.1....29
Exercise 1.2....29
Exercise 1.3....29
Chapter 2. Distributions....31
Frequency Tables....31
NSFG Distributions....33
Outliers....37
First Babies....38
Effect Size....40
Reporting Results....42
Glossary....43
Exercises....43
Exercise 2.1....44
Exercise 2.2....44
Exercise 2.3....44
Chapter 3. Probability Mass Functions....45
PMFs....45
Summarizing a PMF....48
The Class Size Paradox....50
NSFG Data....53
Other Visualizations....54
Glossary....55
Exercises....56
Exercise 3.1....56
Exercise 3.2....56
Exercise 3.3....57
Chapter 4. Cumulative Distribution Functions....59
Percentiles and Percentile Ranks....59
CDFs....62
Comparing CDFs....66
Percentile-Based Statistics....68
Random Numbers....72
Glossary....74
Exercises....75
Exercise 4.1....75
Exercise 4.2....75
Exercise 4.3....76
Exercise 4.4....76
Exercise 4.5....76
Chapter 5. Modeling Distributions....77
The Binomial Distribution....77
The Poisson Distribution....82
The Exponential Distribution....86
The Normal Distribution....90
The Lognormal Distribution....93
Why Model?....97
Glossary....98
Exercises....98
Exercise 5.1....98
Exercise 5.2....99
Exercise 5.3....99
Chapter 6. Probability Density Functions....101
Comparing Distributions....101
Probability Density....104
The Exponential PDF....107
Comparing PMFs and PDFs....109
Kernel Density Estimation....111
The Distribution Framework....115
Glossary....120
Exercises....121
Exercise 6.1....121
Exercise 6.2....121
Chapter 7. Relationships Between Variables....123
Scatter Plots....123
Decile Plots....128
Correlation....130
Strength of Correlation....134
Rank Correlation....136
Correlation and Causation....139
Glossary....140
Exercises....141
Exercise 7.1....141
Exercise 7.2....141
Exercise 7.3....142
Exercise 7.4....142
Exercise 7.5....143
Chapter 8. Estimation....145
Weighing Penguins....145
Robustness....149
Estimating Variance....151
Sampling Distributions....152
Standard Error....155
Confidence Intervals....156
Sources of Error....157
Glossary....157
Exercises....159
Exercise 8.1....159
Exercise 8.2....159
Exercise 8.3....159
Exercise 8.4....160
Exercise 8.5....160
Exercise 8.6....161
Chapter 9. Hypothesis Testing....163
Flipping Coins....163
Testing a Difference in Means....166
Other Test Statistics....169
Testing a Correlation....170
Testing Proportions....172
Glossary....176
Exercises....176
Exercise 9.1....176
Exercise 9.2....177
Chapter 10. Least Squares....179
Least Squares Fit....179
Coefficient of Determination....183
Minimizing MSE....185
Estimation....187
Visualizing Uncertainty....189
Transformation....191
Glossary....196
Exercises....196
Exercise 10.1....196
Exercise 10.2....197
Exercise 10.3....197
Chapter 11. Multiple Regression....199
StatsModels....199
On to Multiple Regression....203
Control Variables....205
Nonlinear Relationships....209
Logistic Regression....212
Glossary....216
Exercises....217
Exercise 11.1....217
Exercise 11.2....217
Exercise 11.3....218
Exercise 11.4....218
Chapter 12. Time Series Analysis....219
Electricity....219
Decomposition....220
Prediction....227
Multiplicative Model....231
Autoregression....236
Moving Average....238
Retrodiction with Autoregression....240
ARIMA....242
Prediction with ARIMA....244
Glossary....245
Exercises....246
Exercise 12.1....246
Exercise 12.2....248
Exercise 12.3....249
Chapter 13. Survival Analysis....251
Survival Functions....251
Hazard Function....253
Marriage Data....255
Weighted Bootstrap....258
Estimating Hazard Functions....260
Estimating Survival Functions....263
Lifelines....265
Confidence Intervals....266
Expected Remaining Lifetime....268
Glossary....272
Exercises....272
Exercise 13.1....272
Exercise 13.2....273
Chapter 14. Analytic Methods....275
Normal Probability Plots....275
Normal Distributions....280
Distribution of Sample Means....284
Distribution of Differences....286
Central Limit Theorem....288
The Limits of the Central Limit Theorem....290
Applying the CLT....292
Correlation Test....295
Chi-squared Test....299
Computation and Analysis....302
Glossary....303
Exercises....303
Exercise 14.1....303
Exercise 14.2....304
Exercise 14.3....304
Exercise 14.4....304
Index....307
About the Author....323
Colophon....323
If you know how to program, you have the skills to turn data into knowledge. This thoroughly revised edition presents statistical concepts computationally, rather than mathematically, using programs written in Python. Through practical examples and exercises based on real-world datasets, you'll learn the entire process of exploratory data analysis—from wrangling data and generating statistics to identifying patterns and testing hypotheses.
Whether you're a data scientist, software engineer, or data enthusiast, you'll get up to speed on commonly used tools including NumPy, SciPy, and Pandas. You'll explore distributions, relationships between variables, visualization, and many other concepts. And all chapters are available as Jupyter notebooks, so you can read the text, run the code, and work on exercises all in one place.