Python Data Science Handbook: Essential Tools for Working with Data. 2 Ed

Python Data Science Handbook: Essential Tools for Working with Data. 2 Ed

Python Data Science Handbook: Essential Tools for Working with Data. 2 Ed
Автор: VanderPlas Jake
Дата выхода: 2023
Издательство: O’Reilly Media, Inc.
Количество страниц: 747
Размер файла: 8.4 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Preface....7

What Is Data Science?....7

Who Is This Book For?....8

Why Python?....9

Outline of the Book....10

Installation Considerations....11

Conventions Used in This Book....12

Using Code Examples....13

O’Reilly Online Learning....14

How to Contact Us....14

I. Jupyter: Beyond Normal Python....16

1. Getting Started in IPython and Jupyter....18

Launching the IPython Shell....18

Launching the Jupyter Notebook....19

Help and Documentation in IPython....20

Accessing Documentation with ?....21

Accessing Source Code with ??....23

Exploring Modules with Tab Completion....23

Keyboard Shortcuts in the IPython Shell....26

Navigation Shortcuts....27

Text Entry Shortcuts....27

Command History Shortcuts....28

Miscellaneous Shortcuts....30

2. Enhanced Interactive Features....31

IPython Magic Commands....31

Running External Code: %run....31

Timing Code Execution: %timeit....32

Help on Magic Functions: ?, %magic, and %lsmagic....33

Input and Output History....34

IPython’s In and Out Objects....34

Underscore Shortcuts and Previous Outputs....36

Suppressing Output....36

Related Magic Commands....37

IPython and Shell Commands....37

Quick Introduction to the Shell....38

Shell Commands in IPython....40

Passing Values to and from the Shell....40

Shell-Related Magic Commands....41

3. Debugging and Profiling....43

Errors and Debugging....43

Controlling Exceptions: %xmode....43

Debugging: When Reading Tracebacks Is Not Enough....45

Profiling and Timing Code....48

Timing Code Snippets: %timeit and %time....49

Profiling Full Scripts: %prun....51

Line-by-Line Profiling with %lprun....53

Profiling Memory Use: %memit and %mprun....54

More IPython Resources....56

Web Resources....56

Books....57

II. Introduction to NumPy....58

4. Understanding Data Types in Python....61

A Python Integer Is More Than Just an Integer....62

A Python List Is More Than Just a List....64

Fixed-Type Arrays in Python....66

Creating Arrays from Python Lists....66

Creating Arrays from Scratch....67

NumPy Standard Data Types....69

5. The Basics of NumPy Arrays....72

NumPy Array Attributes....73

Array Indexing: Accessing Single Elements....73

Array Slicing: Accessing Subarrays....75

One-Dimensional Subarrays....75

Multidimensional Subarrays....76

Subarrays as No-Copy Views....77

Creating Copies of Arrays....78

Reshaping of Arrays....78

Array Concatenation and Splitting....79

Concatenation of Arrays....80

Splitting of Arrays....81

6. Computation on NumPy Arrays: Universal Functions....83

The Slowness of Loops....83

Introducing Ufuncs....85

Exploring NumPy’s Ufuncs....86

Array Arithmetic....86

Absolute Value....88

Trigonometric Functions....89

Exponents and Logarithms....90

Specialized Ufuncs....91

Advanced Ufunc Features....92

Specifying Output....92

Aggregations....93

Outer Products....93

Ufuncs: Learning More....94

7. Aggregations: min, max, and Everything in Between....95

Summing the Values in an Array....95

Minimum and Maximum....96

Multidimensional Aggregates....97

Other Aggregation Functions....98

Example: What Is the Average Height of US Presidents?....99

8. Computation on Arrays: Broadcasting....102

Introducing Broadcasting....102

Rules of Broadcasting....104

Broadcasting Example 1....105

Broadcasting Example 2....106

Broadcasting Example 3....106

Broadcasting in Practice....108

Centering an Array....108

Plotting a Two-Dimensional Function....109

9. Comparisons, Masks, and Boolean Logic....111

Example: Counting Rainy Days....111

Comparison Operators as Ufuncs....113

Working with Boolean Arrays....114

Counting Entries....115

Boolean Operators....116

Boolean Arrays as Masks....118

Using the Keywords and/or Versus the Operators &/|....119

10. Fancy Indexing....122

Exploring Fancy Indexing....122

Combined Indexing....124

Example: Selecting Random Points....125

Modifying Values with Fancy Indexing....127

Example: Binning Data....129

11. Sorting Arrays....132

Fast Sorting in NumPy: np.sort and np.argsort....133

Sorting Along Rows or Columns....134

Partial Sorts: Partitioning....134

Example: k-Nearest Neighbors....135

12. Structured Data: NumPy’s Structured Arrays....140

Exploring Structured Array Creation....142

More Advanced Compound Types....143

Record Arrays: Structured Arrays with a Twist....144

On to Pandas....145

III. Data Manipulation with Pandas....146

13. Introducing Pandas Objects....149

The Pandas Series Object....149

Series as Generalized NumPy Array....150

Series as Specialized Dictionary....151

Constructing Series Objects....152

The Pandas DataFrame Object....153

DataFrame as Generalized NumPy Array....154

DataFrame as Specialized Dictionary....155

Constructing DataFrame Objects....156

The Pandas Index Object....158

Index as Immutable Array....158

Index as Ordered Set....159

14. Data Indexing and Selection....160

Data Selection in Series....160

Series as Dictionary....160

Series as One-Dimensional Array....161

Indexers: loc and iloc....162

Data Selection in DataFrames....164

DataFrame as Dictionary....164

DataFrame as Two-Dimensional Array....166

Additional Indexing Conventions....168

15. Operating on Data in Pandas....170

Ufuncs: Index Preservation....170

Ufuncs: Index Alignment....171

Index Alignment in Series....172

Index Alignment in DataFrames....173

Ufuncs: Operations Between DataFrames and Series....175

16. Handling Missing Data....177

Trade-offs in Missing Data Conventions....177

Missing Data in Pandas....178

None as a Sentinel Value....179

NaN: Missing Numerical Data....180

NaN and None in Pandas....181

Pandas Nullable Dtypes....183

Operating on Null Values....183

Detecting Null Values....184

Dropping Null Values....185

Filling Null Values....187

17. Hierarchical Indexing....189

A Multiply Indexed Series....189

The Bad Way....190

The Better Way: The Pandas MultiIndex....191

MultiIndex as Extra Dimension....192

Methods of MultiIndex Creation....194

Explicit MultiIndex Constructors....194

MultiIndex Level Names....196

MultiIndex for Columns....196

Indexing and Slicing a MultiIndex....197

Multiply Indexed Series....198

Multiply Indexed DataFrames....199

Rearranging Multi-Indexes....201

Sorted and Unsorted Indices....201

Stacking and Unstacking Indices....203

Index Setting and Resetting....204

18. Combining Datasets: concat and append....205

Recall: Concatenation of NumPy Arrays....206

Simple Concatenation with pd.concat....207

Duplicate Indices....208

Concatenation with Joins....210

The append Method....211

19. Combining Datasets: merge and join....213

Relational Algebra....213

Categories of Joins....214

One-to-One Joins....214

Many-to-One Joins....215

Many-to-Many Joins....216

Specification of the Merge Key....217

The on Keyword....218

The left_on and right_on Keywords....218

The left_index and right_index Keywords....219

Specifying Set Arithmetic for Joins....221

Overlapping Column Names: The suffixes Keyword....222

Example: US States Data....224

20. Aggregation and Grouping....230

Planets Data....230

Simple Aggregation in Pandas....231

groupby: Split, Apply, Combine....234

Split, Apply, Combine....234

The GroupBy Object....237

Aggregate, Filter, Transform, Apply....239

Specifying the Split Key....242

Grouping Example....244

21. Pivot Tables....246

Motivating Pivot Tables....246

Pivot Tables by Hand....247

Pivot Table Syntax....248

Multilevel Pivot Tables....248

Additional Pivot Table Options....250

Example: Birthrate Data....251

22. Vectorized String Operations....258

Introducing Pandas String Operations....258

Tables of Pandas String Methods....259

Methods Similar to Python String Methods....260

Methods Using Regular Expressions....261

Miscellaneous Methods....263

Example: Recipe Database....265

A Simple Recipe Recommender....268

Going Further with Recipes....270

23. Working with Time Series....271

Dates and Times in Python....272

Native Python Dates and Times: datetime and dateutil....272

Typed Arrays of Times: NumPy’s datetime64....273

Dates and Times in Pandas: The Best of Both Worlds....276

Pandas Time Series: Indexing by Time....277

Pandas Time Series Data Structures....278

Regular Sequences: pd.date_range....279

Frequencies and Offsets....281

Resampling, Shifting, and Windowing....284

Resampling and Converting Frequencies....286

Time Shifts....287

Rolling Windows....288

Example: Visualizing Seattle Bicycle Counts....290

Visualizing the Data....292

Digging into the Data....294

24. High-Performance Pandas: eval and query....298

Motivating query and eval: Compound Expressions....298

pandas.eval for Efficient Operations....300

DataFrame.eval for Column-Wise Operations....302

Assignment in DataFrame.eval....303

Local Variables in DataFrame.eval....304

The DataFrame.query Method....305

Performance: When to Use These Functions....305

Further Resources....307

IV. Visualization with Matplotlib....309

25. General Matplotlib Tips....311

Importing Matplotlib....311

Setting Styles....311

show or No show? How to Display Your Plots....312

Plotting from a Script....312

Plotting from an IPython Shell....313

Plotting from a Jupyter Notebook....313

Saving Figures to File....314

Two Interfaces for the Price of One....316

26. Simple Line Plots....319

Adjusting the Plot: Line Colors and Styles....322

Adjusting the Plot: Axes Limits....325

Labeling Plots....328

Matplotlib Gotchas....329

27. Simple Scatter Plots....331

Scatter Plots with plt.plot....331

Scatter Plots with plt.scatter....334

plot Versus scatter: A Note on Efficiency....336

Visualizing Uncertainties....337

Basic Errorbars....337

Continuous Errors....339

28. Density and Contour Plots....342

Visualizing a Three-Dimensional Function....342

Histograms, Binnings, and Density....347

Two-Dimensional Histograms and Binnings....350

plt.hist2d: Two-Dimensional Histogram....350

plt.hexbin: Hexagonal Binnings....351

Kernel Density Estimation....352

29. Customizing Plot Legends....355

Choosing Elements for the Legend....357

Legend for Size of Points....359

Multiple Legends....361

30. Customizing Colorbars....363

Customizing Colorbars....364

Choosing the Colormap....365

Color Limits and Extensions....369

Discrete Colorbars....370

Example: Handwritten Digits....371

31. Multiple Subplots....374

plt.axes: Subplots by Hand....374

plt.subplot: Simple Grids of Subplots....376

plt.subplots: The Whole Grid in One Go....377

plt.GridSpec: More Complicated Arrangements....379

32. Text and Annotation....382

Example: Effect of Holidays on US Births....382

Transforms and Text Position....385

Arrows and Annotation....388

33. Customizing Ticks....392

Major and Minor Ticks....392

Hiding Ticks or Labels....394

Reducing or Increasing the Number of Ticks....396

Fancy Tick Formats....397

Summary of Formatters and Locators....400

34. Customizing Matplotlib: Configurations and Stylesheets....402

Plot Customization by Hand....402

Changing the Defaults: rcParams....404

Stylesheets....406

Default Style....407

FiveThiryEight Style....407

ggplot Style....408

Bayesian Methods for Hackers Style....409

Dark Background Style....410

Grayscale Style....411

Seaborn Style....412

35. Three-Dimensional Plotting in Matplotlib....413

Three-Dimensional Points and Lines....414

Three-Dimensional Contour Plots....415

Wireframes and Surface Plots....417

Surface Triangulations....418

Example: Visualizing a Möbius Strip....420

36. Visualization with Seaborn....423

Exploring Seaborn Plots....424

Histograms, KDE, and Densities....424

Pair Plots....426

Faceted Histograms....427

Categorical Plots....428

Joint Distributions....429

Bar Plots....430

Example: Exploring Marathon Finishing Times....432

Further Resources....441

Other Python Visualization Libraries....441

V. Machine Learning....443

37. What Is Machine Learning?....444

Categories of Machine Learning....444

Qualitative Examples of Machine Learning Applications....445

Classification: Predicting Discrete Labels....446

Regression: Predicting Continuous Labels....449

Clustering: Inferring Labels on Unlabeled Data....451

Dimensionality Reduction: Inferring Structure of Unlabeled Data....453

Summary....455

38. Introducing Scikit-Learn....457

Data Representation in Scikit-Learn....457

The Features Matrix....458

The Target Array....459

The Estimator API....461

Basics of the API....462

Supervised Learning Example: Simple Linear Regression....463

Supervised Learning Example: Iris Classification....468

Unsupervised Learning Example: Iris Dimensionality....469

Unsupervised Learning Example: Iris Clustering....471

Application: Exploring Handwritten Digits....472

Loading and Visualizing the Digits Data....473

Unsupervised Learning Example: Dimensionality Reduction....475

Classification on Digits....476

Summary....479

39. Hyperparameters and Model Validation....481

Thinking About Model Validation....481

Model Validation the Wrong Way....482

Model Validation the Right Way: Holdout Sets....483

Model Validation via Cross-Validation....483

Selecting the Best Model....486

The Bias-Variance Trade-off....487

Validation Curves in Scikit-Learn....490

Learning Curves....494

Validation in Practice: Grid Search....499

Summary....501

40. Feature Engineering....503

Categorical Features....503

Text Features....505

Image Features....507

Derived Features....507

Imputation of Missing Data....510

Feature Pipelines....511

41. In Depth: Naive Bayes Classification....513

Bayesian Classification....513

Gaussian Naive Bayes....514

Multinomial Naive Bayes....518

Example: Classifying Text....518

When to Use Naive Bayes....522

42. In Depth: Linear Regression....524

Simple Linear Regression....524

Basis Function Regression....527

Polynomial Basis Functions....527

Gaussian Basis Functions....529

Regularization....531

Ridge Regression (L2 Regularization)....533

Lasso Regression (L1 Regularization)....534

Example: Predicting Bicycle Traffic....536

43. In Depth: Support Vector Machines....543

Motivating Support Vector Machines....543

Support Vector Machines: Maximizing the Margin....545

Fitting a Support Vector Machine....546

Beyond Linear Boundaries: Kernel SVM....550

Tuning the SVM: Softening Margins....554

Example: Face Recognition....555

Summary....560

44. In Depth: Decision Trees and Random Forests....562

Motivating Random Forests: Decision Trees....562

Creating a Decision Tree....563

Decision Trees and Overfitting....566

Ensembles of Estimators: Random Forests....567

Random Forest Regression....570

Example: Random Forest for Classifying Digits....572

Summary....574

45. In Depth: Principal Component Analysis....576

Introducing Principal Component Analysis....576

PCA as Dimensionality Reduction....578

PCA for Visualization: Handwritten Digits....580

What Do the Components Mean?....581

Choosing the Number of Components....582

PCA as Noise Filtering....584

Example: Eigenfaces....586

Summary....589

46. In Depth: Manifold Learning....591

Manifold Learning: “HELLO”....592

Multidimensional Scaling....593

MDS as Manifold Learning....596

Nonlinear Embeddings: Where MDS Fails....598

Nonlinear Manifolds: Locally Linear Embedding....600

Some Thoughts on Manifold Methods....602

Example: Isomap on Faces....604

Example: Visualizing Structure in Digits....608

47. In Depth: k-Means Clustering....613

Introducing k-Means....613

Expectation–Maximization....615

Examples....623

Example 1: k-Means on Digits....623

Example 2: k-Means for Color Compression....626

48. In Depth: Gaussian Mixture Models....631

Motivating Gaussian Mixtures: Weaknesses of k-Means....631

Generalizing E–M: Gaussian Mixture Models....635

Choosing the Covariance Type....640

Gaussian Mixture Models as Density Estimation....640

Example: GMMs for Generating New Data....646

49. In Depth: Kernel Density Estimation....650

Motivating Kernel Density Estimation: Histograms....650

Kernel Density Estimation in Practice....656

Selecting the Bandwidth via Cross-Validation....657

Example: Not-so-Naive Bayes....658

Anatomy of a Custom Estimator....660

Using Our Custom Estimator....662

50. Application: A Face Detection Pipeline....665

HOG Features....666

HOG in Action: A Simple Face Detector....667

1. Obtain a Set of Positive Training Samples....668

2. Obtain a Set of Negative Training Samples....668

3. Combine Sets and Extract HOG Features....670

4. Train a Support Vector Machine....670

5. Find Faces in a New Image....671

Caveats and Improvements....674

Further Machine Learning Resources....676

Index....679

About the Author....745

Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all--IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools.

Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.

With this handbook, you'll learn how:

  • IPython and Jupyter provide computational environments for scientists using Python
  • NumPy includes the ndarray for efficient storage and manipulation of dense data arrays
  • Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data
  • Matplotlib includes capabilities for a flexible range of data visualizations
  • Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms.

Похожее:

Список отзывов:

Нет отзывов к книге.