Data Wrangling Using Pandas, SQL, and Java

Data Wrangling Using Pandas, SQL, and Java

Data Wrangling Using Pandas, SQL, and Java
Автор: Campesato Oswald
Дата выхода: 2023
Издательство: Mercury Learning and Information LLC.
Количество страниц: 357
Размер файла: 1,5 МБ
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Cover....2

Title Page....6

Copyright Page....7

Dedication....8

Contents....9

Preface....19

Chapter 1: Introduction to Python....25

Tools for Python....25

easy_install and pip....26

virtualenv....26

IPython....26

Python Installation....27

Setting the PATH Environment Variable (Windows Only)....28

Launching Python on Your Machine....28

The Python Interactive Interpreter....29

Python Identifiers....30

Lines, Indentation, and Multi-Lines....30

Quotation and Comments....31

Saving Your Code in a Module....33

Some Standard Modules....34

The help() and dir() Functions....35

Compile Time and Runtime Code Checking....36

Simple Data Types....37

Working with Numbers....37

Working with Other Bases....39

The chr() Function....39

The round() Function in Python....40

Formatting Numbers in Python....41

Working with Fractions....41

Unicode and UTF-8....42

Working with Unicode....42

Working with Strings....43

Comparing Strings....45

Formatting Strings in Python....46

Uninitialized Variables and the Value None....46

Slicing and Splicing Strings....46

Testing for Digits and Alphabetic Characters....47

Search and Replace a String in Other Strings....48

Remove Leading and Trailing Characters....49

Printing Text Without NewLine Characters....50

Text Alignment....51

Working with Dates....52

Converting Strings to Dates....54

Exception Handling....54

Handling User Input....56

Command-Line Arguments....58

Summary....60

Chapter 2: Working with Data....61

Dealing with Data: What Can Go Wrong?....62

What is Data Drift?....62

What are Datasets?....63

Data Preprocessing....64

Data Types....65

Preparing Datasets....66

Discrete Data vs. Continuous Data....67

“Binning” Continuous Data....68

Scaling Numeric Data via Normalization....68

Scaling Numeric Data via Standardization....70

Scaling Numeric Data via Robust Standardization....71

What to Look for in Categorical Data....71

Mapping Categorical Data to Numeric Values....72

Working with Dates....74

Working with Currency....75

Working with Outliers and Anomalies....75

Outlier Detection/Removal....76

Finding Outliers with NumPy....78

Finding Outliers with Pandas....81

Calculating Z-Scores to Find Outliers....84

Finding Outliers with SkLearn (Optional)....85

Working with Missing Data....87

Imputing Values: When is Zero a Valid Value?....89

Dealing with Imbalanced Datasets....90

What is SMOTE?....91

SMOTE Extensions....92

The Bias-Variance Tradeoff....92

Types of Bias in Data....94

Analyzing Classifiers (Optional)....95

What is LIME?....95

What is ANOVA?....96

Summary....97

Chapter 3: Introduction to Pandas....98

What is Pandas?....98

Pandas Data Frames....99

Data Frames and Data Cleaning Tasks....99

A Pandas Data Frame Example....100

Describing a Pandas Data Frame....102

Pandas Boolean Data Frames....104

Transposing a Pandas Data Frame....105

Pandas Data Frames and Random Numbers....106

Converting Categorical Data to Numeric Data....108

Merging and Splitting Columns in Pandas....112

Combining Pandas Data Frames....114

Data Manipulation with Pandas Data Frames....115

Pandas Data Frames and CSV Files....116

Useful Options for the Pandas read_csv() Function....119

Reading Selected Rows from CSV Files....120

Pandas Data Frames and Excel Spreadsheets....123

Useful Options for Reading Excel Spreadsheets....124

Select, Add, and Delete Columns in Data Frames....125

Handling Outliers in Pandas....127

Pandas Data Frames and Simple Statistics....129

Finding Duplicate Rows in Pandas....130

Finding Missing Values in Pandas....133

Missing Values in an Iris-Based Dataset....135

Sorting Data Frames in Pandas....139

Working with groupby() in Pandas....141

Aggregate Operations with the titanic.csv Dataset....142

Working with apply() and mapapply() in Pandas....145

Useful One-line Commands in Pandas....148

Working with JSON-based Data....150

Python Dictionary and JSON....151

Python, Pandas, and JSON....152

Summary....153

Chapter 4: RDBMS and SQL....155

What is an RDBMS?....155

What Relationships Do Tables Have in an RDBMS?....155

Features of an RDBMS....156

What is ACID?....157

When Do We Need an RDBMS?....157

The Importance of Normalization....159

A Four-Table RDBMS....161

Detailed Table Descriptions....162

The customers Table....163

The purchase_orders Table....164

The line_items Table....165

The item_desc Table....167

What is SQL?....168

DCL, DDL, DQL, DML, and TCL....169

SQL Privileges....170

Properties of SQL Statements....170

The CREATE Keyword....171

What is MySQL?....171

What about MariaDB?....172

Installing MySQL....172

Data Types in MySQL....173

The CHAR and VARCHAR Data Types....173

String-based Data Types....174

FLOAT and DOUBLE Data Types....174

BLOB and TEXT Data Types....175

MySQL Database Operations....175

Creating a Database....175

Display a List of Databases....176

Display a List of Database Users....176

Dropping a Database....177

Exporting a Database....177

Renaming a Database....179

The INFORMATION_SCHEMA Table....180

The PROCESSLIST Table....181

SQL Formatting Tools....181

Summary....182

Chapter 5: Java, JSON, and XML....184

Working with Java and MySQL....185

Performing the Set-up Steps....185

Creating a MySQL Database in Java....186

Creating a MySQL Table in Java....188

Inserting Data into a MySQL Table in Java....190

Deleting Data and Dropping MySQL Tables in Java....192

Selecting Data from a MySQL Table in Java....194

Updating Data in a MySQL Table in Java....196

Working with JSON, MySQL, and Java....198

Select JSON-based Data from a MySQL Table in Java....199

Working with XML, MySQL, and Java....201

What is XML?....201

What is an XML Schema?....202

When are XML Schemas Useful?....203

Create a MySQL Table for XML Data in Java....204

Read an XML Document in Java....207

Read an XML Document as a String in Java....208

Insert XML-based Data into a MySQL Table in Java....210

Select XML-based Data from a MySQL Table in Java....213

Parse XML-based String Data from a MySQL Table in Java....215

Working with XML Schemas....218

Summary....219

Chapter 6: Data Cleaning Tasks....221

What is Data Cleaning?....222

Data Cleaning for Personal Titles....224

Data Cleaning in SQL....225

Replace NULL with 0....226

Replace NULL Values with Average Value....226

Replace Multiple Values with a Single Value....228

Handle Mismatched Attribute Values....229

Convert Strings to Date Values....231

Data Cleaning from the Command Line (Optional)....233

Working with the sed Utility....233

Working with Variable Column Counts....235

Truncating Rows in CSV Files....237

Generating Rows with Fixed Columns with the awk Utility....239

Converting Phone Numbers....241

Converting Numeric Date Formats....244

Converting Alphabetic Date Formats....248

Working with Date and Time Date Formats....251

Working with Codes, Countries, and Cities....257

Data Cleaning on a Kaggle Dataset....264

Summary....268

Chapter 7: Data Wrangling....269

What is Data Wrangling?....270

Data Transformation: What Does This Mean?....270

CSV Files with Multi-Row Records....273

Pandas Solution (1)....273

Pandas Solution (2)....274

CSV Solution....275

CSV Files, Multi-row Records, and the awk Command....276

Quoted Fields Split on Two Lines (Optional)....277

Overview of the Events Project....281

Why This Project?....282

Project Tasks....283

Generate Country Codes....284

Prepare a List of Cities in Countries....284

Generating City Codes from Country Codes: awk....285

Generating City Codes from Country Codes: Python....289

Generating SQL Statements for the city_codes Table....292

Generating a CSV File for Band Members (Java)....293

Generating a CSV File for Band Members (Python)....297

Generating a Calendar of Events (COE)....300

Project Automation Script....304

Project Follow-up Comments....306

Summary....308

Appendix A: Working with awk....309

The awk Command....310

Built-in Variables That Control awk....310

How Does the awk Command Work?....311

Aligning Text with the printf() Statement....312

Conditional Logic and Control Statements....314

The while Statement....314

A for Loop in awk....315

A for Loop with a break Statement....316

The next and continue Statements....316

Deleting Alternate Lines in Datasets....317

Merging Lines in Datasets....317

Printing File Contents as a Single Line....318

Joining Groups of Lines in a Text File....319

Joining Alternate Lines in a Text File....320

Matching with Meta Characters and Character Sets....321

Printing Lines Using Conditional Logic....322

Splitting Filenames with awk....323

Working with Postfix Arithmetic Operators....324

Numeric Functions in awk....325

One-line awk Commands....328

Useful Short awk Scripts....329

Printing the Words in a Text String in awk....331

Count Occurrences of a String in Specific Rows....331

Printing a String in a Fixed Number of Columns....333

Printing a Dataset in a Fixed Number of Columns....333

Aligning Columns in Datasets....334

Aligning Columns and Multiple Rows in Datasets....335

Removing a Column from a Text File....337

Subsets of Column-aligned Rows in Datasets....338

Counting Word Frequency in Datasets....339

Displaying Only “Pure” Words in a Dataset....341

Working with Multi-line Records in awk....343

A Simple Use Case....344

Another Use Case....346

Summary....347

Index....349

This book is intended primarily for those who plan to become data scientists as wellas anyone who needs to perform data cleaning tasks. It contains a variety of features of NumPy and Pandas and how to create databases and tables in MySQL. Chapter 7 covers many data wrangling tasks using Python scripts and awk-based shell scripts. Companion files with code are available for downloading from the publisher.

Features:

  • Provides the reader with basic Python 3, Java, and Pandas programming concepts, and an introduction to awk
  • Includes a chapter on RDBMs and SQL
  • Companion files with code

Похожее:

Список отзывов:

Нет отзывов к книге.