Title Page....9
Copyright....10
Dedication....11
About the Author....12
Acknowledgments....13
Introduction....14
Who Is This Book For?....16
About This Book....17
Setting Up the Environment....18
Windows....18
macOS....19
Linux....19
Installing Packages with Python....20
Other Tools....22
Summary....23
Chapter 1: Exploratory Data Analysis....24
Your First Day as CEO....25
Finding Patterns in Datasets....25
Using .csv Files to Review and Store Data....28
Displaying Data with Python....29
Calculating Summary Statistics....33
Analyzing Subsets of Data....36
Nighttime Data....36
Seasonal Data....38
Visualizing Data with Matplotlib....40
Drawing and Displaying a Simple Plot....40
Clarifying Plots with Titles and Labels....41
Plotting Subsets of Data....42
Testing Different Plot Types....44
Exploring Correlations....52
Calculating Correlations....52
Understanding Strong vs. Weak Correlations....53
Finding Correlations Between Variables....58
Creating Heat Maps....59
Exploring Further....63
Summary....63
Chapter 2: Forecasting....65
Predicting Customer Demand....65
Cleaning Erroneous Data....66
Plotting Data to Find Trends....69
Performing Linear Regression....70
Applying Algebra to the Regression Line....73
Calculating Error Measurements....76
Using Regression to Forecast Future Trends....81
Trying More Regression Models....83
Multivariate Linear Regression to Predict Sales....84
Trigonometry to Capture Variations....87
Choosing the Best Regression to Use for Forecasting....91
Exploring Further....96
Summary....97
Chapter 3: Group Comparisons....99
Reading Population Data....99
Summary Statistics....100
Random Samples....102
Differences Between Sample Data....105
Performing Hypothesis Testing....109
The t-Test....111
Nuances of Hypothesis Testing....113
Comparing Groups in a Practical Context....115
Summary....120
Chapter 4: A/B Testing....121
The Need for Experimentation....121
Running Experiments to Test New Hypotheses....123
Understanding the Math of A/B Testing....128
Translating the Math into Practice....129
Optimizing with the Champion/Challenger Framework....132
Preventing Mistakes with Twyman’s Law and A/A Testing....134
Understanding Effect Sizes....136
Calculating the Significance of Data....138
Applications and Advanced Considerations....141
The Ethics of A/B Testing....143
Summary....146
Chapter 5: Binary Classification....147
Minimizing Customer Attrition....147
Using Linear Probability Models to Find High-Risk Customers....149
Plotting Attrition Risk....151
Confirming Relationships with Linear Regression....152
Predicting the Future....156
Making Business Recommendations....158
Measuring Prediction Accuracy....159
Using Multivariate LPMs....162
Creating New Metrics....164
Considering the Weaknesses of LPMs....167
Predicting Binary Outcomes with Logistic Regression....168
Drawing Logistic Curves....168
Fitting the Logistic Function to Our Data....171
Applications of Binary Classification....173
Summary....174
Chapter 6: Supervised Learning....175
Predicting Website Traffic....176
Reading and Plotting News Article Data....177
Using Linear Regression as a Prediction Method....180
Understanding Supervised Learning....182
k-Nearest Neighbors....184
Implementing k-NN....186
Performing k-NN with Python’s sklearn....188
Using Other Supervised Learning Algorithms....190
Decision Trees....192
Random Forests....194
Neural Networks....195
Measuring Prediction Accuracy....198
Working with Multivariate Models....201
Using Classification Instead of Regression....202
Summary....205
Chapter 7: Unsupervised Learning....206
Unsupervised Learning vs. Supervised Learning....206
Generating and Exploring Data....208
Rolling the Dice....208
Using Another Kind of Die....213
The Origin of Observations with Clustering....215
Clustering in Business Applications....220
Analyzing Multiple Dimensions....222
E-M Clustering....224
The Guessing Step....227
The Expectation Step....229
The Maximization Step....231
The Convergence Step....234
Other Clustering Methods....237
Other Unsupervised Learning Methods....240
Summary....242
Chapter 8: Web Scraping....243
Understanding How Websites Work....243
Creating Your First Web Scraper....245
Parsing HTML Code....248
Scraping an Email Address....248
Searching for Addresses Directly....250
Performing Searches with Regular Expressions....251
Using Metacharacters for Flexible Searches....253
Fine-Tuning Searches with Escape Sequences....254
Combining Metacharacters for Advanced Searches....257
Using Regular Expressions to Search for Email Addresses....259
Converting Results to Usable Data....260
Using Beautiful Soup....262
Parsing HTML Label Elements....264
Scraping and Parsing HTML Tables....265
Advanced Scraping....268
Summary....269
Chapter 9: Recommendation Systems....271
Popularity-Based Recommendations....272
Item-Based Collaborative Filtering....275
Measuring Vector Similarity....277
Calculating Cosine Similarity....279
Implementing Item-Based Collaborative Filtering....281
User-Based Collaborative Filtering....284
Case Study: Music Recommendations....288
Generating Recommendations with Advanced Systems....290
Summary....292
Chapter 10: Natural Language Processing....293
Using NLP to Detect Plagiarism....293
Understanding the word2vec NLP Model....295
Quantifying Similarities Between Words....295
Creating a System of Equations....298
Analyzing Numeric Vectors in word2vec....304
Manipulating Vectors with Mathematical Calculations....308
Detecting Plagiarism with word2vec....309
Using Skip-Thoughts....311
Topic Modeling....314
Other Applications of NLP....317
Summary....318
Chapter 11: Data Science in Other Languages....320
Winning Soccer Games with SQL....321
Reading and Analyzing Data....321
Getting Familiar with SQL....323
Setting Up a SQL Database....324
Running SQL Queries....325
Combining Data by Joining Tables....329
Winning Soccer Games with R....333
Getting Familiar with R....333
Applying Linear Regression in R....335
Using R to Plot Data....337
Gaining Other Valuable Skills....339
Summary....342
Index....343
Dive into the exciting world of data science with this practical introduction. Packed with essential skills and useful examples, Dive Into Data Science will show you how to obtain, analyze, and visualize data so you can leverage its power to solve common business challenges.With only a basic understanding of Python and high school math, you’ll be able to effortlessly work through the book and start implementing data science in your day-to-day work. From improving a bike sharing company to extracting data from websites and creating recommendation systems, you’ll discover how to find and use data-driven solutions to make business decisions.Topics covered include conducting exploratory data analysis, running A/B tests, performing binary classification using logistic regression models, and using machine learning algorithms.
With this practical guide at your fingertips, harness the power of programming, mathematical theory, and good old common sense to find data-driven solutions that make a difference. Don’t wait; dive right in!