Python Tools for Data Scientists: Pocket Primer

Python Tools for Data Scientists: Pocket Primer

Python Tools for Data Scientists: Pocket Primer
Автор: Campesato Oswald
Дата выхода: 2023
Издательство: Mercury Learning and Information LLC.
Количество страниц: 323
Размер файла: 1.6 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Cover....1

Half-Title....2

Title....4

Copyright....5

Dedication....6

Contents....8

Preface....20

Chapter 1: Introduction to Python....26

Tools for Python....26

easy_install and pip....26

virtualenv....27

Python Installation....27

Setting the PATH Environment Variable (Windows Only)....28

Launching Python on Your Machine....28

The Python Interactive Interpreter....28

Python Identifiers....29

Lines, Indentations, and Multi-Lines....30

Quotation and Comments in Python....30

Saving Your Code in a Module....32

Some Standard Modules in Python....33

The help() and dir() Functions....33

Compile Time and Runtime Code Checking....34

Simple Data Types in Python....35

Working with Numbers....35

Working with Other Bases....37

The chr() Function....37

The round() Function in Python....38

Formatting Numbers in Python....38

Unicode and UTF-8....39

Working with Unicode....39

Listing 1.1: Unicode1.py....40

Working with Strings....40

Comparing Strings....41

Listing 1.2: Compare.py....42

Formatting Strings in Python....42

Uninitialized Variables and the Value None in Python....42

Slicing and Splicing Strings....43

Testing for Digits and Alphabetic Characters....43

Listing 1.3: CharTypes.py....43

Search and Replace a String in Other Strings....44

Listing 1.4: FindPos1.py....44

Listing 1.5: Replace1.py....45

Remove Leading and Trailing Characters....45

Listing 1.6: Remove1.py....45

Printing Text without NewLine Characters....46

Text Alignment....47

Working with Dates....48

Listing 1.7: Datetime2.py....48

Listing 1.8: datetime2.out....48

Converting Strings to Dates....49

Listing 1.9: String2Date.py....49

Exception Handling in Python....49

Listing 1.10: Exception1.py....50

Handling User Input....51

Listing 1.11: UserInput1.py....51

Listing 1.12: UserInput2.py....52

Listing 1.13: UserInput3.py....52

Command-Line Arguments....53

Listing 1.14: Hello.py....54

Summary....54

Chapter 2: Introduction to NumPy....56

What is NumPy?....57

Useful NumPy Features....57

What are NumPy Arrays?....57

Listing 2.1: nparray1.py....58

Working with Loops....58

Listing 2.2: loop1.py....58

Appending Elements to Arrays (1)....59

Listing 2.3: append1.py....59

Appending Elements to Arrays (2)....60

Listing 2.4: append2.py....60

Multiplying Lists and Arrays....60

Listing 2.5: multiply1.py....61

Doubling the Elements in a List....61

Listing 2.6: double_list1.py....61

Lists and Exponents....62

Listing 2.7: exponent_list1.py....62

Arrays and Exponents....62

Listing 2.8: exponent_array1.py....62

Math Operations and Arrays....63

Listing 2.9: mathops_array1.py....63

Working with “−1” Sub-ranges With Vectors....63

Listing 2.10: npsubarray2.py....63

Working with “−1” Sub-ranges with Arrays....64

Listing 2.11: np2darray2.py....64

Other Useful NumPy Methods....64

Arrays and Vector Operations....65

Listing 2.12: array_vector.py....65

NumPy and Dot Products (1)....66

Listing 2.13: dotproduct1.py....66

NumPy and Dot Products (2)....67

Listing 2.14: dotproduct2.py....67

NumPy and the Length of Vectors....67

Listing 2.15: array_norm.py....68

NumPy and Other Operations....68

Listing 2.16: otherops.py....69

NumPy and the reshape() Method....69

Listing 2.17: numpy_reshape.py....69

Calculating the Mean and Standard Deviation....70

Listing 2.18: sample_mean_std.py....71

Code Sample with Mean and Standard Deviation....71

Listing 2.19: stat_values.py....72

Trimmed Mean and Weighted Mean....72

Working with Lines in the Plane (Optional)....73

Plotting Randomized Points with NumPy and Matplotlib....75

Listing 2.20: np_plot.py....76

Plotting a Quadratic with NumPy and Matplotlib....76

Listing 2.21: np_plot_quadratic.py....76

What is Linear Regression?....77

What is Multivariate Analysis?....78

What about Non-Linear Datasets?....78

The MSE (Mean Squared Error) Formula....79

Other Error Types....80

Non-Linear Least Squares....81

Calculating the MSE Manually....81

Find the Best-Fitting Line in NumPy....82

Listing 2.22: find_best_fit.py....83

Calculating MSE by Successive Approximation (1)....83

Listing 2.23: plain_linreg1.py....84

Calculating MSE by Successive Approximation (2)....86

Listing 2.24: plain_linreg2.py....86

Google Colaboratory....88

Uploading CSV Files in Google Colaboratory....90

Listing 2.25: upload_csv_file.ipynb....90

Summary....91

Chapter 3: Introduction to Pandas....92

What is Pandas?....92

Pandas Options and Settings....93

Pandas Data Frames....93

Data Frames and Data Cleaning Tasks....94

Alternatives to Pandas....94

A Pandas Data Frame with a NumPy Example....95

Listing 3.1: pandas_df.py....95

Describing a Pandas Data Frame....97

Listing 3.2: pandas_df_describe.py....97

Pandas Boolean Data Frames....99

Listing 3.3: pandas_boolean_df.py....99

Transposing a Pandas Data Frame....100

Pandas Data Frames and Random Numbers....101

Listing 3.4: pandas_random_df.py....101

Listing 3.5: pandas_combine_df.py....101

Reading CSV Files in Pandas....102

Listing 3.6: sometext.txt....102

Listing 3.7: read_csv_file.py....103

The loc() and iloc() Methods in Pandas....103

Converting Categorical Data to Numeric Data....104

Listing 3.8: cat2numeric.py....104

Listing 3.9: shirts.csv....105

Listing 3.10: shirts.py....105

Matching and Splitting Strings in Pandas....107

Listing 3.11: shirts_str.py....107

Converting Strings to Dates in Pandas....110

Listing 3.12: string2date.py....110

Merging and Splitting Columns in Pandas....111

Listing 3.13: employees.csv....111

Listing 3.14: emp_merge_split.py....111

Combining Pandas Data Frames....113

Listing 3.15: concat_frames.py....113

Data Manipulation with Pandas Data Frames (1)....113

Listing 3.16: pandas_quarterly_df1.py....114

Data Manipulation with Pandas Data Frames (2)....115

Listing 3.17: pandas_quarterly_df2.py....115

Data Manipulation with Pandas Data Frames (3)....116

Listing 3.18: pandas_quarterly_df3.py....116

Pandas Data Frames and CSV Files....117

Listing 3.19: weather_data.py....117

Listing 3.20: people.csv....118

Listing 3.21: people_pandas.py....118

Managing Columns in Data Frames....119

Switching Columns....120

Appending Columns....120

Deleting Columns....121

Inserting Columns....121

Scaling Numeric Columns....122

Listing 3.22: numbers.csv....122

Listing 3.23: scale_columns.py....123

Managing Rows in Pandas....124

Selecting a Range of Rows in Pandas....124

Listing 3.24: duplicates.csv....124

Listing 3.25: row_range.py....125

Finding Duplicate Rows in Pandas....126

Listing 3.26: duplicates.py....126

Listing 3.27: drop_duplicates.py....127

Inserting New Rows in Pandas....129

Listing 3.28: emp_ages.csv....129

Listing 3.29: insert_row.py....129

Handling Missing Data in Pandas....129

Listing 3.30: employees2.csv....130

Listing 3.31: missing_values.py....130

Multiple Types of Missing Values....132

Listing 3.32: employees3.csv....132

Listing 3.33: missing_multiple_types.py....132

Test for Numeric Values in a Column....132

Listing 3.34: test_for_numeric.py....133

Replacing NaN Values in Pandas....133

Listing 3.35: missing_fill_drop.py....133

Sorting Data Frames in Pandas....135

Listing 3.36: sort_df.py....135

Working with groupby() in Pandas....137

Listing 3.37: groupby1.py....137

Working with apply() and mapapply() in Pandas....138

Listing 3.38: apply1.py....139

Listing 3.39: apply2.py....140

Listing 3.40: mapapply1.py....140

Listing 3.41: mapapply2.py....141

Handling Outliers in Pandas....142

Listing 3.42: outliers_zscores.py....142

Pandas Data Frames and Scatterplots....144

Listing 3.43: pandas_scatter_df.py....144

Pandas Data Frames and Simple Statistics....145

Listing 3.44: housing.csv....145

Listing 3.45: housing_stats.py....145

Aggregate Operations in Pandas Data Frames....146

Listing 3.46: aggregate1.py....147

Aggregate Operations with the titanic.csv Dataset....148

Listing 3.47: aggregate2.py....148

Save Data Frames as CSV Files and Zip Files....150

Listing 3.48: save2csv.py....150

Pandas Data Frames and Excel Spreadsheets....151

Listing 3.49: write_people_xlsx.py....151

Listing 3.50: read_people_xslx.py....151

Working with JSON-based Data....152

Python Dictionary and JSON....152

Listing 3.51: dict2json.py....152

Python, Pandas, and JSON....153

Listing 3.52: pd_python_json.py....153

Useful One-line Commands in Pandas....154

What is Method Chaining?....155

Pandas and Method Chaining....156

Pandas Profiling....156

Listing 3.53: titanic.csv....156

Listing 3.54: profile_titanic.py....157

Summary....157

Chapter 4: Working with Sklearn and Scipy....158

What is Sklearn?....158

Sklearn Features....159

The Digits Dataset in Sklearn....160

Listing 4.1: load_digits1.py....160

Listing 4.2: load_digits2.py....161

Listing 4.3: sklearn_digits.py....162

The train_test_split() Class in Sklearn....163

Selecting Columns for X and y....164

What is Feature Engineering?....164

The Iris Dataset in Sklearn (1)....165

Listing 4.4: sklearn_iris1.py....165

Sklearn, Pandas, and the Iris Dataset....167

Listing 4.5: pandas_iris.py....167

The Iris Dataset in Sklearn (2)....169

Listing 4.6: sklearn_iris2.py....169

The Faces Dataset in Sklearn (Optional)....171

Listing 4.7: sklearn_faces.py....171

What is SciPy?....173

Installing SciPy....173

Permutations and Combinations in SciPy....174

Listing 4.8: scipy_perms.py....174

Listing 4.9: scipy_combinatorics.py....174

Calculating Log Sums....175

Listing 4.10: scipy_matrix_inv.py....175

Calculating Polynomial Values....175

Listing 4.11: scipy_poly.py....175

Calculating the Determinant of a Square Matrix....176

Listing 4.12: scipy_determinant.py....176

Calculating the Inverse of a Matrix....177

Listing 4.13: scipy_matrix_inv.py....177

Calculating Eigenvalues and Eigenvectors....177

Listing 4.14: scipy_eigen.py....177

Calculating Integrals (Calculus)....178

Listing 4.15: scipy_integrate.py....178

Calculating Fourier Transforms....179

Listing 4.16: scipy_fourier.py....179

Flipping Images in SciPy....180

Listing 4.17: scipy_flip_image.py....180

Rotating Images in SciPy....181

Listing 4.18: scipy_rotate_image.py....181

Google Colaboratory....182

Uploading CSV Files in Google Colaboratory....183

Listing 4.19: upload_csv_file.ipynb....183

Summary....184

Chapter 5: Data Cleaning Tasks....186

What is Data Cleaning?....187

Data Cleaning for Personal Titles....188

Data Cleaning in SQL....189

Replace NULL with 0....190

Replace NULL Values with the Average Value....190

Listing 5.1: replace_null_values.sql....190

Replace Multiple Values with a Single Value....192

Listing 5.2: reduce_values.sql....192

Handle Mismatched Attribute Values....193

Listing 5.3: type_mismatch.sql....193

Convert Strings to Date Values....195

Listing 5.4: str_to_date.sql....195

Data Cleaning from the Command Line (optional)....197

Working with the sed Utility....197

Listing 5.5: delimiter1.txt....197

Listing 5.6: delimiter1.sh....197

Working with Variable Column Counts....199

Listing 5.7: variable_columns.csv....199

Listing 5.8: variable_columns.sh....199

Listing 5.9: variable_columns2.sh....200

Truncating Rows in CSV Files....201

Listing 5.10: variable_columns3.sh....201

Generating Rows with Fixed Columns with the awk Utility....202

Listing 5.11: FixedFieldCount1.sh....202

Listing 5.12: employees.txt....203

Listing 5.13: FixedFieldCount2.sh....203

Converting Phone Numbers....204

Listing 5.14: phone_numbers.txt....204

Listing 5.15: phone_numbers.sh....205

Converting Numeric Date Formats....206

Listing 5.16: dates.txt....207

Listing 5.17: dates.sh....207

Listing 5.18: dates2.sh....209

Converting Alphabetic Date Formats....211

Listing 5.19: dates2.txt....211

Listing 5.20: dates3.sh....211

Working with Date and Time Date Formats....213

Listing 5.21: date-times.txt....214

Listing 5.22: date-times-padded.sh....214

Working with Codes, Countries, and Cities....220

Listing 5.23: country_codes.csv....220

Listing 5.24: add_country_codes.sh....220

Listing 5.25: countries_cities.csv....221

Listing 5.26: split_countries_codes.sh....222

Listing 5.27: countries_cities2.csv....223

Listing 5.28: split_countries_codes2.sh....223

Data Cleaning on a Kaggle Dataset....226

Listing 5.29: convert_marketing.sh....226

Summary....229

Chapter 6: Data Visualization....230

What is Data Visualization?....230

Types of Data Visualization....231

What is Matplotlib?....232

Diagonal Lines in Matplotlib....232

Listing 6.1: diagonallines.py....232

A Colored Grid in Matplotlib....233

Listing 6.2: plotgrid2.py....233

Randomized Data Points in Matplotlib....234

Listing 6.3: lin_plot_reg.py....234

A Histogram in Matplotlib....235

Listing 6.4: histogram1.py....235

A Set of Line Segments in Matplotlib....236

Listing 6.5: line_segments.py....236

Plotting Multiple Lines in Matplotlib....237

Listing 6.6: plt_array2.py....237

Trigonometric Functions in Matplotlib....238

Listing 6.7: sincos.py....238

Display IQ Scores in Matplotlib....239

Listing 6.8: iq_scores.py....239

Plot a Best-Fitting Line in Matplotlib....240

Listing 6.9: plot_best_fit.py....240

The Iris Dataset in SkLearn....241

Listing 6.10: sklearn_iris1.py....241

SkLearn, Pandas, and the Iris Dataset....243

Listing 6.11: pandas_iris.py....243

Working with Seaborn....245

Features of Seaborn....246

Seaborn Built-in Datasets....246

Listing 6.12: seaborn_tips.py....246

The Iris Dataset in Seaborn....247

Listing 6.13: seaborn_iris.py....247

The Titanic Dataset in Seaborn....248

Listing 6.14: seaborn_titanic_plot.py....248

Extracting Data from the Titanic Dataset in Seaborn (1)....249

Listing 6.15: seaborn_titanic.py....249

Extracting Data from the Titanic Dataset in Seaborn (2)....251

Listing 6.16: seaborn_titanic2.py....251

Visualizing a Pandas Dataset in Seaborn....253

Listing 6.17: pandas_seaborn.py....253

Data Visualization in Pandas....255

Listing 6.18: pandas_viz1.py....255

What is Bokeh?....257

Listing 6.19: bokeh_trig.py....257

Summary....259

Appendix A: Working with Data....260

What are Datasets?....260

Data Preprocessing....261

Data Types....262

Preparing Datasets....263

Discrete Data vs. Continuous Data....263

“Binning” Continuous Data....264

Scaling Numeric Data via Normalization....265

Scaling Numeric Data via Standardization....266

What to Look for in Categorical Data....267

Mapping Categorical Data to Numeric Values....268

Working with Dates....270

Working with Currency....270

Missing Data, Anomalies, and Outliers....271

Missing Data....271

Anomalies and Outliers....271

Outlier Detection....272

What is Data Drift?....273

What is Imbalanced Classification?....274

What is SMOTE?....275

SMOTE Extensions....275

Analyzing Classifiers (Optional)....276

What is LIME?....276

What is ANOVA?....277

The Bias-Variance Trade-Off....277

Types of Bias in Data....279

Summary....280

Appendix B: Working with awk....282

The awk Command....283

Built-in Variables that Control awk....283

How Does the awk Command Work?....284

Aligning Text with the printf Statement....285

Listing B.1: columns2.txt....285

Listing B.2: AlignColumns1.sh....285

Conditional Logic and Control Statements....286

The while Statement....286

A for loop in awk....287

Listing B.3: Loop.sh....287

A for loop with a break Statement....288

The next and continue Statements....288

Deleting Alternate Lines in Datasets....289

Listing B.4: linepairs.csv....289

Listing B.5: deletelines.sh....289

Merging Lines in Datasets....289

Listing B.6: columns.txt....289

Listing B.7: ColumnCount1.sh....290

Printing File Contents as a Single Line....290

Joining Groups of Lines in a Text File....291

Listing B.8: digits.txt....291

Listing B.9: digits.sh....291

Joining Alternate Lines in a Text File....291

Listing B.10: columns2.txt....291

Listing B.11: JoinLines.sh....292

Listing B.12: JoinLines2.sh....292

Listing B.13: JoinLines2.sh....292

Matching with Meta Characters and Character Sets....293

Listing B.14: Patterns1.sh....293

Listing B.15: columns3.txt....293

Listing B.16: MatchAlpha1.sh....293

Printing Lines Using Conditional Logic....294

Listing B.17: products.txt....294

Splitting Filenames with awk....295

Listing B.18: SplitFilename2.sh....295

Working with Postfix Arithmetic Operators....295

Listing B.19: mixednumbers.txt....295

Listing B.20: AddSubtract1.sh....295

Numeric Functions in awk....296

One Line awk Commands....299

Useful Short awk Scripts....300

Listing B.21: data.txt....300

Printing the Words in a Text String in awk....301

Listing B.22: Fields2.sh....301

Count Occurrences of a String in Specific Rows....301

Listing B.23: data1.csv....302

Listing B.24: data2.csv....302

Listing B.25: checkrows.sh....302

Printing a String in a Fixed Number of Columns....303

Listing B.26: FixedFieldCount1.sh....303

Printing a Dataset in a Fixed Number of Columns....303

Listing B.27: VariableColumns.txt....303

Listing B.28: Fields3.sh....303

Aligning Columns in Datasets....304

Listing B.29: mixed-data.csv....304

Listing B.30: mixed-data.sh....304

Aligning Columns and Multiple Rows in Datasets....305

Listing B.31: mixed-data2.csv....305

Listing B.32: aligned-data2.csv....306

Listing B.33: mixed-data2.sh....306

Removing a Column from a Text File....306

Listing B.34: VariableColumns.txt....307

Listing B.35: RemoveColumn.sh....307

Subsets of Column-aligned Rows in Datasets....307

Listing B.36: sub-rows-cols.txt....307

Listing B.37: sub-rows-cols.sh....307

Counting Word Frequency in Datasets....308

Listing B.38: WordCounts1.sh....309

Listing B.39: WordCounts2.sh....309

Listing B.40: columns4.txt....310

Displaying Only “Pure” Words in a Dataset....310

Listing B.41: onlywords.sh....310

Working with Multi-line Records in awk....312

Listing B.42: employees.txt....312

Listing B.43: employees.sh....312

A Simple Use Case....313

Listing B.44: quotes3.csv....313

Listing B.45 delim1.sh....313

Another Use Case....315

Listing B.46: dates2.csv....315

Listing B.47: string2date2.sh....315

Summary....316

Index....318

As part of the best-selling Pocket Primer series, this book is designed to provide a thorough introduction to numerous Python tools for data scientists. The book covers features of NumPy and Pandas, how to write regular expressions, and how to perform data cleaning tasks. It includes separate chapters on data visualization and working with Sklearn and SciPy. Companion files with source code are available.

FEATURES:

  • Introduces Python, NumPy, Sklearn, SciPy, and awk
  • Covers data cleaning tasks and data visualization
  • Features numerous code samples throughout
  • Includes companion files with source code

Похожее:

Список отзывов:

Нет отзывов к книге.