Title Page....2
Copyright Page....3
Table of Contents....4
Introduction....10
About This Book....11
Foolish Assumptions....12
Icons Used in This Book....12
Where to Go from Here....13
Chapter 1 Wrapping Your Head Around Data Science....14
Seeing Who Can Make Use of Data Science....15
Inspecting the Pieces of the Data Science Puzzle....17
Collecting, querying, and consuming data....18
Applying mathematical modeling to data science tasks....20
Deriving insights from statistical methods....20
Coding, coding, coding — it’s just part of the game....21
Applying data science to a subject area....21
Communicating data insights....23
Chapter 2 Tapping into Critical Aspects of Data Engineering....24
Defining the Three Vs....24
Grappling with data volume....25
Handling data velocity....25
Dealing with data variety....26
Identifying Important Data Sources....27
Grasping the Differences among Data Approaches....27
Defining data science....28
Defining machine learning engineering....29
Defining data engineering....29
Comparing machine learning engineers, data scientists, and data engineers....30
Storing and Processing Data for Data Science....31
Storing data and doing data science directly in the cloud....31
Using serverless computing to execute data science....32
Containerizing predictive applications within Kubernetes....33
Sizing up popular cloud-warehouse solutions....34
Introducing NoSQL databases....35
Processing data in real-time....36
Recognizing the Impact of Generative AI....36
The reshaping of data engineering....37
Tools and frameworks for supporting AI workloads....37
Chapter 3 Using a Machine to Learn from Data....38
Defining Machine Learning and Its Processes....38
Walking through the steps of the machine learning process....39
Becoming familiar with machine learning terms....39
Considering Learning Styles....40
Learning with supervised algorithms....40
Learning with unsupervised algorithms....41
Learning with reinforcement....41
Seeing What You Can Do....41
Selecting algorithms based on function....42
Generating real-time analytics with Spark....45
Chapter 4 Math, Probability, and Statistical Modeling....48
Exploring Probability and Inferential Statistics....49
Probability distributions....51
Conditional probability with Naïve Bayes....53
Quantifying Correlation....54
Calculating correlation with Pearson’s r....54
Ranking variable pairs using Spearman’s rank correlation....56
Reducing Data Dimensionality with Linear Algebra....57
Decomposing data to reduce dimensionality....57
Reducing dimensionality with factor analysis....61
Decreasing dimensionality and removing outliers with PCA....62
Modeling Decisions with Multiple Criteria Decision-Making....63
Turning to traditional MCDM....64
Focusing on fuzzy MCDM....66
Introducing Regression Methods....66
Linear regression....66
Logistic regression....68
Ordinary least squares regression methods....69
Detecting Outliers....69
Analyzing extreme values....69
Detecting outliers with univariate analysis....70
Detecting outliers with multivariate analysis....71
Introducing Time Series Analysis....73
Identifying patterns in time series....73
Modeling univariate time series data....74
Chapter 5 Grouping Your Way into Accurate Predictions....76
Starting with Clustering Basics....77
Getting to know clustering algorithms....78
Examining clustering similarity metrics....80
Identifying Clusters in Your Data....81
Clustering with the k-means algorithm....81
Estimating clusters with kernel density estimation....83
Clustering with hierarchical algorithms....84
Dabbling in the DBScan neighborhood....86
Categorizing Data with Decision Tree and Random Forest Algorithms....88
Drawing a Line between Clustering and Classification....89
Introducing instance-based learning classifiers....90
Getting to know classification algorithms....90
Making Sense of Data with Nearest Neighbor Analysis....93
Classifying Data with Average Nearest Neighbor Algorithms....95
Classifying with K-Nearest Neighbor Algorithms....98
Understanding how the k-nearest neighbor algorithm works....99
Knowing when to use the k-nearest neighbor algorithm....100
Exploring common applications of k-nearest neighbor algorithms....101
Solving Real-World Problems with Nearest Neighbor Algorithms....101
Seeing k-nearest neighbor algorithms in action....101
Seeing average nearest neighbor algorithms in action....102
Chapter 6 Coding Up Data Insights and Decision Engines....104
Seeing Where Python Fits into Your Data Science Strategy....104
Using Python for Data Science....105
Sorting out the various Python data types....107
Numbers in Python....108
Strings in Python....108
Lists in Python....109
Tuples in Python....110
Sets in Python....110
Dictionaries in Python....110
Putting loops to good use in Python....110
Having fun with functions....112
Keeping cool with classes....113
Checking out some useful Python libraries....116
Saying hello to the NumPy library....116
Getting up close and personal with the SciPy library....119
Peeking into the pandas offering....120
Bonding with Matplotlib for data visualization....120
Learning from data with scikit-learn....122
Chapter 7 Generating Insights with Software Applications....124
Choosing the Best Tools for Your Data Science Strategy....125
Getting a Handle on SQL and Relational Databases....127
Investing Some Effort into Database Design....132
Defining data types....132
Designing constraints properly....133
Normalizing your database....133
Narrowing the Focus with SQL Functions....136
Making Life Easier with Excel....140
Using Excel to quickly get to know your data....141
Filtering in Excel....141
Using conditional formatting....143
Excel charting to visually identify outliers and trends....144
Reformatting and summarizing with PivotTables....146
Automating Excel tasks with macros....148
Chapter 8 Telling Powerful Stories with Data....152
Data Visualizations: The Big Three....153
Data storytelling for decision-makers....154
Data showcasing for analysts....154
Designing data art for activists....155
Designing to Meet the Needs of Your Target Audience....155
Step 1: Brainstorm (All about Eve)....156
Step 2: Define the purpose....157
Step 3: Choose the most functional visualization type for your purpose....158
Picking the Most Appropriate Design Style....159
Inducing a calculating, exacting response....159
Eliciting a strong emotional response....160
Selecting the Appropriate Data Graphic Type....161
Standard chart graphics....163
Comparative graphics....166
Statistical plots....170
Topology structures....171
Spatial plots and maps....173
Testing Data Graphics....176
Adding Context....177
Creating context with data....178
Creating context with annotations....178
Creating context with graphical elements....178
Chapter 9 Ten Free or Low-Cost Data Science Libraries and Platforms....180
Scraping the Web with Beautiful Soup....180
Wrangling Data with pandas....181
Visualizing Data with Looker Studio....181
Machine Learning with scikit-learn....181
Creating Interactive Dashboards with Streamlit....182
Doing Geospatial Data Visualization with Kepler.gl....182
Making Charts with Tableau Public....182
Doing Web-Based Data Visualization with RAWGraphs....183
Making Cool Infographics with Infogram....183
Making Cool Infographics with Canva....183
Index....184
EULA....194
Feel confident navigating the fundamentals of data science
Data Science Essentials For Dummies is a quick reference on the core concepts of the exploding and in-demand data science field, which involves data collection and working on dataset cleaning, processing, and visualization. This direct and accessible resource helps you brush up on key topics and is right to the point―eliminating review material, wordy explanations, and fluff―so you get what you need, fast.
Perfect for supplementing classroom learning, reviewing for a certification, or staying knowledgeable on the job, Data Science Essentials For Dummies is a reliable reference that's great to keep on hand as an everyday desk reference.