Preface ix
1. Football Analytics 1
Baseball Has the Three True Outcomes: Does Football? 3
Do Running Backs Matter? 4
How Data Can Help Us Contextualize Passing Statistics 5
Can You Beat the Odds? 5
Do Teams Beat the Draft? 6
Tools for Football Analytics 6
First Steps in Python and R 8
Example Data: Who Throws Deep? 10
nflfastR in R 11
nfl_data_py in Python 14
Data Science Tools Used in This Chapter 16
Suggested Readings 17
2. Exploratory Data Analysis: Stable Versus Unstable Quarterback Statistics 19
Defining Questions 21
Obtaining and Filtering Data 22
Summarizing Data 25
Plotting Data 29
Histograms 30
Boxplots 35
Player-Level Stability of Passing Yards per Attempt 37
Deep Passes Versus Short Passes 41
So, What Should We Do with This Insight? 51
Data Science Tools Used in This Chapter 52
Exercises 53
Suggested Readings 53
3. Simple Linear Regression: Rushing Yards Over Expected 55
Exploratory Data Analysis 58
Simple Linear Regression 64
Who Was the Best in RYOE? 69
Is RYOE a Better Metric? 73
Data Science Tools Used in This Chapter 76
Exercises 76
Suggested Readings 77
4. Multiple Regression: Rushing Yards Over Expected 79
Definition of Multiple Linear Regression 79
Exploratory Data Analysis 82
Applying Multiple Linear Regression 94
Analyzing RYOE 100
So, Do Running Backs Matter? 105
Assumption of Linearity 108
Data Science Tools Used in This Chapter 111
Exercises 111
Suggested Readings 112
5. Generalized Linear Models: Completion Percentage over Expected 113
Generalized Linear Models 117
Building a GLM 118
GLM Application to Completion Percentage 121
Is CPOE More Stable Than Completion Percentage? 128
A Question About Residual Metrics 131
A Brief Primer on Odds Ratios 132
Data Science Tools Used in This Chapter 134
Exercises 134
Suggested Readings 134
6. Using Data Science for Sports Betting: Poisson Regression and Passing Touchdowns 137
The Main Markets in Football 138
Application of Poisson Regression: Prop Markets 140
The Poisson Distribution 141
Individual Player Markets and Modeling 149
Poisson Regression Coefficients 162
Closing Thoughts on GLMs 169
Data Science Tools Used in This Chapter 170
Exercises 170
Suggested Readings 171
7. Web Scraping: Obtaining and Analyzing Draft Picks 173
Web Scraping with Python 174
Web Scraping in R 179
Analyzing the NFL Draft 182
The Jets/Colts 2018 Trade Evaluated 192
Are Some Teams Better at Drafting Players Than Others? 194
Data Science Tools Used in This Chapter 201
Exercises 201
Suggested Readings 202
8. Principal Component Analysis and Clustering: Player Attributes 203
Web Scraping and Visualizing NFL Scouting Combine Data 205
Introduction to PCA 217
PCA on All Data 221
Clustering Combine Data 230
Clustering Combine Data in Python 230
Clustering Combine Data in R 233
Closing Thoughts on Clustering 236
Data Science Tools Used in This Chapter 237
Exercises 237
Suggested Readings 238
9. Advanced Tools and Next Steps 239
Advanced Modeling Tools 240
Time Series Analysis 241
Multivariate Statistics Beyond PCA 241
Quantile Regression 242
Bayesian Statistics and Hierarchical Models 242
Survival Analysis/Time-to-Event 245
Bayesian Networks/Structural Equation Modeling 246
Machine Learning 246
Command Line Tools 246
Bash Example 248
Suggested Readings for bash 250
Version Control 250
Git 251
GitHub and GitLab 252
GitHub Web Pages and Résumés 253
Suggested Reading for Git 253
Style Guides and Linting 254
Packages 255
Suggested Readings for Packages 255
Computer Environments 255
Interactives and Report Tools to Share Data 256
Artificial Intelligence Tools 257
Conclusion 258
A. Python and R Basics 261
B. Summary Statistics and Data Wrangling: Passing the Ball 269
C. Data-Wrangling Fundamentals 287
Glossary 309
Index 317
Baseball is not the only sport to use "moneyball." American football teams, fantasy football players, fans, and gamblers are increasingly using data to gain an edge on the competition. Professional and college teams use data to help identify team needs and select players to fill those needs. Fantasy football players and fans use data to try to defeat their friends, while sports bettors use data in an attempt to defeat the sportsbooks.
In this concise book, Eric Eager and Richard Erickson provide a clear introduction to using statistical models to analyze football data using both Python and R. Whether your goal is to qualify for an entry-level football analyst position, dominate your fantasy football league, or simply learn R and Python with fun example cases, this book is your starting place.
Obtain NFL data from Python and R packages and web scraping
Visualize and explore data
Apply regression models to play-by-play data
Extend regression models to classification problems in football
Apply data science to sports betting with individual player props
Understand player athletic attributes using multivariate statistics