Welcome to Packt Early Access....10
Python Data Analysis, Fourth Edition: An end-to-end guide covering data processing, data manipulation and data visualization....10
Chapter 1: Getting Started with Python Libraries....12
Join our book community on Discord....13
Navigating the landscape of data analysis....14
Exploring libraries for data analysis....15
Data analysis process methodologies....16
Knowledge discovery from data (KDD)....16
SEMMA....18
CRISP-DM....19
Standard process of data analysis....21
Compare Data Analysis, Data Science and Data Engineering....23
Data Science Domain Job Roles and Skillsets....24
Roles of Data Analyst, Data Scientist, and Data Engineer....24
Skillsets for Data Analyst and Data Scientist....25
Roles of ML Engineer and NLP Engineer....28
Skill set for Data Engineer and ML Engineer....29
A quick look at MLOps....31
Installing Python 3....32
Python installation and setup on Windows....32
Python installation and setup on Linux....33
Python installation and setup on Mac OS X with a GUI installer....33
Python installation and setup on Mac OS X with brew....34
Software tools used in this book....34
Using IPython as a shell....35
Hands on with Ipython....36
Reading manual pages....39
Where to find help?....40
Using JupyterLab....41
Using Jupyter Notebooks....42
Advanced features of Jupyter Notebooks....44
Using PyCharm and VS Code....53
Pycharm....53
Visual Studio Code....54
Using Databricks for PySpark....56
Summary....57
Chapter 2: NumPy and pandas....59
Join our book community on Discord....60
Technical requirements....61
Grasping the essence of NumPy arrays....62
Array properties and attributes....66
Selecting array elements....67
NumPy array numerical data types....69
Data type objects....72
Data type character codes....73
Data type constructors....74
Data type attributes....75
Converting arrays....75
Manipulating array shapes....76
Stacking arrays....79
Splitting arrays....83
Creating views and copies....85
Slicing NumPy Array....88
Broadcasting arrays....92
More on NumPy Methods....94
Creating Pandas DataFrames and Series....98
Describing pandas DataFrames....100
Understanding pandas Series....101
Pandas Series Features....103
Reading and querying the Quandl and Nasdaq Data Link data....105
Grouping and joining pandas DataFrames....109
Concatenating DataFrames....113
Working with missing values....115
Creating pivot tables....116
Dealing with dates....118
Date Features....120
Date Methods....123
Summary....127
References....127
Chapter 3: Statistics for Data Insights....129
Join our book community on Discord....130
Technical requirements....131
Understanding attributes of data and their types....131
Nominal attributes....132
Ordinal attributes....132
Numeric attributes....132
Discrete and continuous attributes....133
Measuring central tendency....134
Mean....134
Mode....135
Median....135
Measuring dispersion....136
Range....136
Inter Quartile Range (IQR)....136
Variance....137
Standard deviation....138
Skewness and kurtosis....139
Understanding relationships using covariance and correlation coefficients....141
Covariance....141
Correlation....141
Pearson's correlation coefficient....142
Spearman's rank correlation coefficient....142
Kendall's rank correlation coefficient....143
Collecting samples....144
Probability Sampling....144
Non-probability sampling....145
Performing parametric tests....146
Understanding t-tests....146
One Sample t-test....147
Two Sample t-test....148
Paired Sample t-test....149
ANOVA....150
One-way ANOVA....151
Two-way ANOVA....152
Performing non-parametric tests....152
Chi-Square Test....153
Mann-Whitney U Test....155
Wilcoxon Signed-Rank Test....156
Kruskal-Walis Test....157
AB testing....159
Performing Sampling and Split the Data into Groups....163
Formulating a Hypothesis and Performing Sampling....164
Bayes theorem....165
Summary....167
Chapter 4: Linear Algebra....169
Join our book community on Discord....170
What is linear algebra?....171
Introduction to scalar, vector, matrix, and tensor....172
Scalar and vectors....172
Matrices and tensors....173
Working with linear algebra in python....174
Fitting polynomials with NumPy....175
Exploring matrix operations....179
The determinant operation....179
Finding the rank of a matrix....180
Matrix inverse using NumPy....180
Solving linear equations using NumPy....182
Eigenvalues, eigenvectors, and matrix decomposition....183
Eigenvectors and Eigenvalues....184
Decomposing a matrix using SVD....185
LU Decomposition....186
QR Decomposition....188
Probability distributions and random number generation....189
Probability Functions for Random Variables....190
Probability Mass Functions....190
Density Functions....190
Types of data distributions....191
Discrete Probability Distributions....191
Continuous Probability Distributions....198
Generating random numbers....206
Test normality of data using SciPy....207
Histogram....208
Anderson-Darling Test....212
D'Agostino-Pearson test....213
Creating a masked array using numpy.ma subpackage....214
Summary....216
Understand data analysis pipelines using machine learning algorithms and techniques with this practical guide
Data analysis enables you to generate value from small and big data by discovering new patterns, and Python is one of the most popular tools for analyzing a wide variety of data. With this book, you'll get up and running using Python for data analysis by exploring the different phases used in data analysis and learning how to use modern libraries from the Python ecosystem to create efficient data pipelines.
Starting with the essential statistical and data analysis fundamentals using Python, you'll perform complex data analysis and modeling, data manipulation, data cleaning, and data visualization using easy-to-follow examples. You'll then understand how to conduct time series analysis and signal processing using ARMA models. As you advance, you'll get to grips with smart processing and data analytics using machine learning algorithms such as regression, classification, Principal Component Analysis (PCA), and clustering. You'll also work on real-world examples to analyze textual and image data using natural language processing (NLP) and image analytics techniques, respectively. Finally, the book will demonstrate parallel computing using Dask.
By the end of this data analysis book, you'll be equipped with the skills you need to prepare data for analysis and create meaningful data visualizations for forecasting values from data.
This book is for data analysts, business analysts, statisticians, and data scientists looking to learn how to use Python for data analysis. Students and academic faculties will also find this book useful for learning and teaching Python data analysis using a hands-on approach. A basic understanding of math and working knowledge of the Python programming language will help you get started with this book.