Exploratory Data Analysis with Python Cookbook....2
Contributors....7
About the author....7
About the reviewers....7
Preface....24
Who this book is for....25
What this book covers....26
To get the most out of this book....28
Download the example code files....28
Download the color images....28
Conventions used....28
Get in touch....29
Share Your Thoughts....30
Download a free PDF copy of this book....30
Chapter 1: Generating Summary Statistics....31
Technical requirements....31
Analyzing the mean of a dataset....31
Getting ready....32
How to do it....32
How it works.......33
Theres more.......33
Checking the median of a dataset....33
Getting ready....33
How to do it....34
How it works.......34
Theres more.......34
Identifying the mode of a dataset....34
Getting ready....35
How to do it....35
How it works.......35
Theres more.......36
Checking the variance of a dataset....36
Getting ready....36
How to do it....36
How it works.......36
Theres more....37
Identifying the standard deviation of a dataset....37
Getting ready....37
How to do it....37
How it works.......38
Theres more.......38
Generating the range of a dataset....38
Getting ready....38
How to do it....38
How it works.......39
Theres more.......39
Identifying the percentiles of a dataset....39
Getting ready....39
How to do it....39
How it works.......40
Theres more.......40
Checking the quartiles of a dataset....40
Getting ready....40
How to do it....40
How it works.......41
Theres more.......41
Analyzing the interquartile range (IQR) of a dataset....41
Getting ready....42
How to do it....42
How it works.......42
Chapter 2: Preparing Data for EDA....43
Technical requirements....43
Grouping data....43
Getting ready....43
How to do it....43
How it works.......44
Theres more.......44
See also....44
Appending data....44
Getting ready....44
How to do it....45
How it works.......45
Theres more.......46
Concatenating data....46
Getting ready....46
How to do it....46
How it works.......47
Theres more.......47
See also....47
Merging data....47
Getting ready....48
How to do it....48
How it works.......49
Theres more.......49
See also....49
Sorting data....49
Getting ready....49
How to do it....49
How it works.......50
Theres more.......50
Categorizing data....50
Getting ready....50
How to do it....50
How it works.......51
Theres more.......51
Removing duplicate data....51
Getting ready....51
How to do it....52
How it works.......52
Theres more.......52
Dropping data rows and columns....52
Getting ready....52
How to do it....53
How it works.......53
Theres more.......53
Replacing data....53
Getting ready....53
How to do it....54
How it works.......54
Theres more.......54
See also....54
Changing a data format....54
Getting ready....55
How to do it....55
How it works.......55
Theres more.......55
See also....55
Dealing with missing values....55
Getting ready....56
How to do it....56
How it works.......56
Theres more.......56
See also....56
Chapter 3: Visualizing Data in Python....58
Technical requirements....58
Preparing for visualization....58
Getting ready....58
How to do it....59
How it works.......59
Theres more.......59
Visualizing data in Matplotlib....60
Getting ready....60
How to do it....60
How it works.......63
Theres more.......64
See also....64
Visualizing data in Seaborn....64
Getting ready....64
How to do it....64
How it works.......67
Theres more.......68
See also....68
Visualizing data in GGPLOT....68
Getting ready....68
How to do it....69
How it works.......71
Theres more.......71
See also....71
Visualizing data in Bokeh....71
Getting ready....72
How to do it....72
How it works.......75
There's more.......76
See also....76
Chapter 4: Performing Univariate Analysis in Python....77
Technical requirements....77
Performing univariate analysis using a histogram....77
Getting ready....77
How to do it....77
How it works.......79
Performing univariate analysis using a boxplot....79
Getting ready....79
How to do it....79
How it works.......81
Theres more.......81
Performing univariate analysis using a violin plot....81
Getting ready....82
How to do it....82
How it works.......83
Performing univariate analysis using a summary table....83
Getting ready....83
How to do it....83
How it works.......84
Theres more.......84
Performing univariate analysis using a bar chart....84
Getting ready....85
How to do it....85
How it works.......86
Performing univariate analysis using a pie chart....86
Getting ready....86
How to do it....86
How it works.......87
Chapter 5: Performing Bivariate Analysis in Python....89
Technical requirements....89
Analyzing two variables using a scatter plot....89
Getting ready....90
How to do it....90
How it works.......91
Theres more.......92
See also.......92
Creating a crosstabtwo-way table on bivariate data....92
Getting ready....92
How to do it....92
How it works.......93
Analyzing two variables using a pivot table....93
Getting ready....93
How to do it....93
How it works.......94
There is more.......94
Generating pairplots on two variables....94
Getting ready....95
How to do it....95
How it works.......95
Analyzing two variables using a bar chart....96
Getting ready....96
How to do it....96
How it works.......97
There is more.......98
Generating box plots for two variables....98
Getting ready....98
How to do it....98
How it works.......99
Creating histograms on two variables....99
Getting ready....99
How to do it....100
How it works.......101
Analyzing two variables using a correlation analysis....101
Getting ready....101
How to do it....102
How it works.......103
Chapter 6: Performing Multivariate Analysis in Python....104
Technical requirements....104
Implementing Cluster Analysis on multiple variables using Kmeans....104
Getting ready....104
How to do it....105
How it works.......106
There is more.......106
See also.......107
Choosing the optimal number of clusters in Kmeans....107
Getting ready....107
How to do it....107
How it works.......108
There is more.......109
See also.......109
Profiling Kmeans clusters....109
Getting ready....109
How to do it....109
How it works.......111
Theres more.......112
Implementing principal component analysis on multiple variables....112
Getting ready....112
How to do it....112
How it works.......113
There is more.......113
See also.......114
Choosing the number of principal components....114
Getting ready....114
How to do it....114
How it works.......115
Analyzing principal components....116
Getting ready....116
How to do it....116
How it works.......117
Theres more.......118
See also.......118
Implementing factor analysis on multiple variables....118
Getting ready....118
How to do it....118
How it works.......120
There is more.......121
Determining the number of factors....121
Getting ready....121
How to do it....121
How it works.......122
Analyzing the factors....123
Getting ready....123
How to do it....123
How it works.......126
Chapter 7: Analyzing Time Series Data in Python....128
Technical requirements....129
Using line and boxplots to visualize time series data....129
Getting ready....129
How to do it....129
How it works.......131
Spotting patterns in time series....132
Getting ready....132
How to do it....132
How it works.......134
Performing time series data decomposition....134
Getting ready....136
How to do it....136
How it works.......140
Performing smoothing – moving average....141
Getting ready....141
How to do it....141
How it works....144
See also.......145
Performing smoothing – exponential smoothing....145
Getting ready....145
How to do it....145
How it works.......148
See also.......148
Performing stationarity checks on time series data....148
Getting ready....149
How to do it....149
How it works.......150
See also....150
Differencing time series data....150
Getting ready....151
How to do it....151
How it works.......152
Getting ready....153
How to do it....153
How it works.......156
See also.......157
Chapter 8: Analysing Text Data in Python....158
Technical requirements....158
Preparing text data....158
Getting ready....159
How to do it....159
How it works.......161
Theres more....162
See also....162
Dealing with stop words....162
Getting ready....162
How to do it....162
How it works.......165
Theres more....166
Analyzing part of speech....167
Getting ready....167
How to do it....167
How it works.......169
Performing stemming and lemmatization....170
Getting ready....170
How to do it....170
How it works.......174
Analyzing ngrams....175
Getting ready....175
How to do it....175
How it works.......177
Creating word clouds....177
Getting ready....177
How to do it....178
How it works.......179
Checking term frequency....179
Getting ready....180
How to do it....180
How it works.......182
Theres more....182
See also....183
Checking sentiments....183
Getting ready....183
How to do it....183
How it works.......186
Theres more....186
See also....186
Performing Topic Modeling....187
Getting ready....187
How to do it....187
How it works.......190
Choosing an optimal number of topics....190
Getting ready....190
How to do it....190
How it works.......192
Chapter 9: Dealing with Outliers and Missing Values....193
Technical requirements....193
Identifying outliers....193
Getting ready....194
How to do it....194
How it works.......195
Spotting univariate outliers....195
Getting ready....196
How to do it....196
How it works.......197
Finding bivariate outliers....198
Getting ready....198
How to do it....198
How it works.......200
Identifying multivariate outliers....200
Getting ready....200
How to do it....200
How it works.......204
See also....205
Flooring and capping outliers....205
Getting ready....205
How to do it....205
How it works.......207
Removing outliers....207
Getting ready....207
How to do it....208
How it works.......209
Replacing outliers....209
Getting ready....209
How to do it....209
How it works.......211
Identifying missing values....211
Getting ready....212
How to do it....212
How it works.......214
Dropping missing values....214
Getting ready....215
How to do it....215
How it works.......215
Replacing missing values....216
Getting ready....216
How to do it....216
How it works.......217
Imputing missing values using machine learning models....217
Getting ready....218
How to do it....218
How it works.......219
Chapter 10: Performing Automated Exploratory Data Analysis in Python....220
Technical requirements....220
Doing Automated EDA using pandas profiling....220
Getting ready....221
How to do it....221
How it works.......226
See also....227
Performing Automated EDA using dtale....227
Getting ready....227
How to do it....227
How it works.......231
See also....232
Doing Automated EDA using AutoViz....232
Getting ready....232
How to do it....232
How it works.......236
See also....236
Performing Automated EDA using Sweetviz....237
Getting ready....237
How to do it....237
How it works.......239
See also....239
Implementing Automated EDA using custom functions....239
Getting ready....240
How to do it....240
How it works.......243
Theres more....244
Index....245
Why subscribe?....259
Other Books You May Enjoy....259
Packt is searching for authors like you....263
Share Your Thoughts....263
Download a free PDF copy of this book....263
In today's data-centric world, the ability to extract meaningful insights from vast amounts of data has become a valuable skill across industries. Exploratory Data Analysis (EDA) lies at the heart of this process, enabling us to comprehend, visualize, and derive valuable insights from various forms of data.
This book is a comprehensive guide to Exploratory Data Analysis using the Python programming language. It provides practical steps needed to effectively explore, analyze, and visualize structured and unstructured data. It offers hands-on guidance and code for concepts such as generating summary statistics, analyzing single and multiple variables, visualizing data, analyzing text data, handling outliers, handling missing values and automating the EDA process. It is suited for data scientists, data analysts, researchers or curious learners looking to gain essential knowledge and practical steps for analyzing vast amounts of data to uncover insights.
Python is an open-source general purpose programming language which is used widely for data science and data analysis given its simplicity and versatility. It offers several libraries which can be used to clean, analyze, and visualize data. In this book, we will explore popular Python libraries such as Pandas, Matplotlib, and Seaborn and provide workable code for analyzing data in Python using these libraries.
By the end of this book, you will have gained comprehensive knowledge about EDA and mastered the powerful set of EDA techniques and tools required for analyzing both structured and unstructured data to derive valuable insights.
Whether you are a data analyst, data scientist, researcher or a curious learner looking to analyze structured and unstructured data, this book will appeal to you. It aims to empower you with essential knowledge and practical skills for analyzing and visualizing data to uncover insights.
It covers several EDA concepts and provides hands-on instructions on how these can be applied using various Python libraries. Familiarity with basic statistical concepts and foundational knowledge of python programming will help you understand the content better and maximize your learning experience.
Общее впечатление
«Data Analysis with Python Cookbook» — это практическое руководство, которое фокусируется на одном из ключевых этапов работы с данными — исследовательском анализе данных (Exploratory Data Analysis, EDA). В отличие от многих теоретических учебников, эта книга построена по принципу «рецептов» (cookbook), что делает её идеальным инструментом для тех, кто хочет быстро перейти от теории к практике. Автор, Айоделе Олулейе, сертифицированный специалист по данным с опытом работы в консалтинге и финансовом секторе, предлагает чёткие, пошаговые инструкции с объяснением кода и его работы.
Книга нацелена на широкую аудиторию: от начинающих дата-сайентистов и аналитиков до исследователей, желающих систематизировать свои навыки работы с Python в области EDA. Важно отметить, что издание охватывает не только структурированные данные (таблицы), но и уделяет значительное внимание неструктурированным данным — тексту, что является несомненным плюсом.
Содержательный анализ
Книга состоит из 10 глав, которые логически выстроены от простого к сложному:
Сильные стороны
Недостатки и ограничения
Целевая аудитория
Книга будет наиболее полезна:
Заключение
«Data Analysis with Python Cookbook» — это ценный практический ресурс, который займёт достойное место на «полке» любого специалиста по работе с данными. Он не заменит фундаментальных учебников по статистике или машинному обучению, но станет незаменимым помощником в повседневной работе, предлагая готовые и отлаженные решения для широкого спектра аналитических задач. Книга отлично структурирована, написана понятным языком и затрагивает как классические методы EDA, так и современные инструменты автоматизации. Особую ценность представляют главы по многомерному анализу и обработке текстов.
Оценка: 9 / 10