R in Action....2
Copyright....4
Praise for the previous edition of R in Action....6
brief contents....7
contents....9
Front matter....23
preface....23
acknowledgments....26
about this book....28
What's new in the third edition....30
Who should read this book....32
How this book is organized: A road map....33
Advice for data miners....38
About the code....39
liveBook discussion forum....41
about the author....42
about the cover illustration....42
Part 1. Getting started....43
1 Introduction to R....46
1.1 Why use R?....49
1.2 Obtaining and installing R....53
1.3 Working with R....54
1.3.1 Getting started....55
1.3.2 Using RStudio....59
1.3.3 Getting help....63
1.3.4 The workspace....66
1.3.5 Projects....68
1.4 Packages....69
1.4.1 What are packages?....69
1.4.2 Installing a package....70
1.4.3 Loading a package....71
1.4.4 Learning about a package....71
1.5 Using output as input: Reusing results....73
1.6 Working with large datasets....74
1.7 Working through an example....75
Summary....78
2 Creating a dataset....79
2.1 Understanding datasets....80
2.2 Data structures....82
2.2.1 Vectors....83
2.2.2 Matrices....84
2.2.3 Arrays....87
2.2.4 Data frames....88
2.2.5 Factors....92
2.2.6 Lists....96
2.2.7 Tibbles....98
2.3 Data input....101
2.3.1 Entering data from the keyboard....102
2.3.2 Importing data from a delimited text file....105
2.3.3 Importing data from Excel....111
2.3.4 Importing data from JSON....113
2.3.5 Importing data from the web....113
2.3.6 Importing data from SPSS....114
2.3.7 Importing data from SAS....115
2.3.8 Importing data from Stata....116
2.3.9 Accessing database management systems....116
2.3.10 Importing data via StatTransfer....119
2.4 Annotating datasets....121
2.4.1 Variable labels....121
2.4.2 Value labels....122
2.5 Useful functions for working with data objects....122
Summary....124
3 Basic data management....125
3.1 A working example....125
3.2 Creating new variables....127
3.3 Recoding variables....129
3.4 Renaming variables....131
3.5 Missing values....132
3.5.1 Recoding values to missing....134
3.5.2 Excluding missing values from analyses....134
3.6 Date values....136
3.6.1 Converting dates to character variables....138
3.6.2 Going further....138
3.7 Type conversions....139
3.8 Sorting data....140
3.9 Merging datasets....141
3.9.1 Adding columns to a data frame....141
3.9.2 Adding rows to a data frame....142
3.10 Subsetting datasets....142
3.10.1 Selecting variables....142
3.10.2 Dropping variables....144
3.10.3 Selecting observations....145
3.10.4 The subset() function....146
3.10.5 Random samples....147
3.11 Using dplyr to manipulate data frames....148
3.11.1 Basic dplyr functions....148
3.11.2 Using pipe operators to chain statements....152
3.12 Using SQL statements to manipulate data frames....152
Summary....153
4 Getting started with graphs....155
4.1 Creating a graph with ggplot2....157
4.1.1 ggplot....157
4.1.2 Geoms....158
4.1.3 Grouping....164
4.1.4 Scales....167
4.1.5 Facets....171
4.1.6 Labels....174
4.1.7 Themes....175
4.2 ggplot2 details....177
4.2.1 Placing the data and mapping options....178
4.2.2 Graphs as objects....181
4.2.3 Saving graphs....182
4.2.4 Common mistakes....184
Summary....185
5 Advanced data management....187
5.1 A data management challenge....188
5.2 Numerical and character functions....189
5.2.1 Mathematical functions....190
5.2.2 Statistical functions....192
5.2.3 Probability functions....197
5.2.4 Character functions....202
5.2.5 Other useful functions....205
5.2.6 Applying functions to matrices and data frames....207
5.2.7 A solution for the data management challenge....209
5.3 Control flow....216
5.3.1 Repetition and looping....217
5.3.2 Conditional execution....218
5.4 User-written functions....221
5.5 Reshaping data....224
5.5.1 Transposing....224
5.5.2 Converting from wide to long dataset formats....226
5.6 Aggregating data....230
Summary....233
Part 2. Basic methods....234
6 Basic graphs....236
6.1 Bar charts....237
6.1.1 Simple bar charts....237
6.1.2 Stacked, grouped, and filled bar charts....239
6.1.3 Mean bar charts....242
6.1.4 Tweaking bar charts....246
6.2 Pie charts....253
6.3 Tree maps....257
6.4 Histograms....262
6.5 Kernel density plots....265
6.6 Box plots....271
6.6.1 Using parallel box plots to compare groups....273
6.6.2 Violin plots....277
6.7 Dot plots....280
Summary....283
7 Basic statistics....285
7.1 Descriptive statistics....287
7.1.1 A menagerie of methods....287
7.1.2 Even more methods....289
7.1.3 Descriptive statistics by group....293
7.1.4 Summarizing data interactively with dplyr....295
7.1.5 Visualizing results....299
7.2 Frequency and contingency tables....299
7.2.1 Generating frequency tables....300
7.2.2 Tests of independence....310
7.2.3 Measures of association....312
7.2.4 Visualizing results....313
7.3 Correlations....314
7.3.1 Types of correlations....315
7.3.2 Testing correlations for significance....319
7.3.3 Visualizing correlations....323
7.4 T-tests....323
7.4.1 Independent t-test....324
7.4.2 Dependent t-test....325
7.4.3 When there are more than two groups....327
7.5 Nonparametric tests of group differences....327
7.5.1 Comparing two groups....327
7.5.2 Comparing more than two groups....330
7.6 Visualizing group differences....333
Summary....334
Part 3. Intermediate methods....336
8 Regression....339
8.1 The many faces of regression....341
8.1.1 Scenarios for using OLS regression....343
8.1.2 What you need to know....345
8.2 OLS regression....345
8.2.1 Fitting regression models with lm()....347
8.2.2 Simple linear regression....351
8.2.3 Polynomial regression....354
8.2.4 Multiple linear regression....357
8.2.5 Multiple linear regression with interactions....361
8.3 Regression diagnostics....364
8.3.1 A typical approach....366
8.3.2 An enhanced approach....369
8.3.3 Multicollinearity....378
8.4 Unusual observations....380
8.4.1 Outliers....380
8.4.2 High-leverage points....381
8.4.3 Influential observations....384
8.5 Corrective measures....389
8.5.1 Deleting observations....390
8.5.2 Transforming variables....391
8.5.3 Adding or deleting variables....394
8.5.4 Trying a different approach....394
8.6 Selecting the best regression model....395
8.6.1 Comparing models....396
8.6.2 Variable selection....397
8.7 Taking the analysis further....402
8.7.1 Cross-validation....403
8.7.2 Relative importance....406
Summary....410
9 Analysis of variance....412
9.1 A crash course on terminology....413
9.2 Fitting ANOVA models....417
9.2.1 The aov() function....418
9.2.2 The order of formula terms....420
9.3 One-way ANOVA....422
9.3.1 Multiple comparisons....425
9.3.2 Assessing test assumptions....431
9.4 One-way ANCOVA....433
9.4.1 Assessing test assumptions....437
9.4.2 Visualizing the results....438
9.5 Two-way factorial ANOVA....440
9.6 Repeated measures ANOVA....443
9.7 Multivariate analysis of variance (MANOVA)....449
9.7.1 Assessing test assumptions....451
9.7.2 Robust MANOVA....453
9.8 ANOVA as regression....454
Summary....458
10 Power analysis....460
10.1 A quick review of hypothesis testing....461
10.2 Implementing power analysis with the pwr package....465
10.2.1 T-tests....466
10.2.2 ANOVA....469
10.2.3 Correlations....470
10.2.4 Linear models....471
10.2.5 Tests of proportions....472
10.2.6 Chi-square tests....474
10.2.7 Choosing an appropriate effect size in novel situations....476
10.3 Creating power analysis plots....479
10.4 Other packages....481
Summary....482
11 Intermediate graphs....484
11.1 Scatter plots....486
11.1.1 Scatter plot matrices....491
11.1.2 High-density scatter plots....496
11.1.3 3D scatter plots....502
11.1.4 Spinning 3D scatter plots....506
11.1.5 Bubble plots....509
11.2 Line charts....513
11.3 Corrgrams....517
11.4 Mosaic plots....526
Summary....531
12 Resampling statistics and bootstrapping....532
12.1 Permutation tests....533
12.2 Permutation tests with the coin package....537
12.2.1 Independent two-sample and k-sample tests....539
12.2.2 Independence in contingency tables....542
12.2.3 Independence between numeric variables....543
12.2.4 Dependent two-sample and k-sample tests....543
12.2.5 Going further....544
12.3 Permutation tests with the lmPerm package....545
12.3.1 Simple and polynomial regression....546
12.3.2 Multiple regression....548
12.3.3 One-way ANOVA and ANCOVA....549
12.3.4 Two-way ANOVA....550
12.4 Additional comments on permutation tests....551
12.5 Bootstrapping....552
12.6 Bootstrapping with the boot package....554
12.6.1 Bootstrapping a single statistic....557
12.6.2 Bootstrapping several statistics....560
Summary....563
Part 4. Advanced methods....565
13 Generalized linear models....568
13.1 Generalized linear models and the glm() function....569
13.1.1 The glm() function....571
13.1.2 Supporting functions....574
13.1.3 Model fit and regression diagnostics....575
13.2 Logistic regression....577
13.2.1 Interpreting the model parameters....581
13.2.2 Assessing the impact of predictors on the probability of an outcome....582
13.2.3 Overdispersion....584
13.2.4 Extensions....586
13.3 Poisson regression....587
13.3.1 Interpreting the model parameters....591
13.3.2 Overdispersion....593
13.3.3 Extensions....596
Summary....599
14 Principal components and factor analysis....600
14.1 Principal components and factor analysis in R....602
14.2 Principal components....603
14.2.1 Selecting the number of components to extract....604
14.2.2 Extracting principal components....606
14.2.3 Rotating principal components....610
14.2.4 Obtaining principal component scores....611
14.3 Exploratory factor analysis....613
14.3.1 Deciding how many common factors to extract....614
14.3.2 Extracting common factors....615
14.3.3 Rotating factors....617
14.3.4 Factor scores....620
14.3.5 Other EFA-related packages....621
14.4 Other latent variable models....621
Summary....622
15 Time series....625
15.1 Creating a time-series object in R....628
15.2 Smoothing and seasonal decomposition....633
15.2.1 Smoothing with simple moving averages....633
15.2.2 Seasonal decomposition....636
15.3 Exponential forecasting models....645
15.3.1 Simple exponential smoothing....647
15.3.2 Holt and Holt–Winters exponential smoothing....651
15.3.3 The ets() function and automated forecasting....654
15.4 ARIMA forecasting models....657
15.4.1 Prerequisite concepts....657
15.4.2 ARMA and ARIMA models....660
15.4.3 Automated ARIMA forecasting....668
15.5 Going further....669
Summary....670
16 Cluster analysis....672
16.1 Common steps in cluster analysis....674
16.2 Calculating distances....678
16.3 Hierarchical cluster analysis....680
16.4 Partitioning-cluster analysis....688
16.4.1 K-means clustering....688
16.4.2 Partitioning around medoids....698
16.5 Avoiding nonexistent clusters....700
16.6 Going further....705
Summary....706
17 Classification....707
17.1 Preparing the data....709
17.2 Logistic regression....711
17.3 Decision trees....714
17.3.1 Classical decision trees....715
17.3.2 Conditional inference trees....721
17.4 Random forests....724
17.5 Support vector machines....728
17.5.1 Tuning an SVM....733
17.6 Choosing a best predictive solution....736
17.7 Understanding black box predictions....741
17.7.1 Break-down plots....743
17.7.2 Plotting Shapley values....747
17.8 Going further....749
Summary....751
18 Advanced methods for missing data....753
18.1 Steps in dealing with missing data....756
18.2 Identifying missing values....759
18.3 Exploring missing-values patterns....761
18.3.1 Visualizing missing values....761
18.3.2 Using correlations to explore missing values....768
18.4 Understanding the sources and impact of missing data....771
18.5 Rational approaches for dealing with incomplete data....773
18.6 Deleting missing data....775
18.6.1 Complete-case analysis (listwise deletion)....776
18.6.2 Available case analysis (pairwise deletion)....779
18.7 Single imputation....780
18.7.1 Simple imputation....780
18.7.2 K-nearest neighbor imputation....780
18.7.3 missForest....783
18.8 Multiple imputation....785
18.9 Other approaches to missing data....791
Summary....791
Part 5. Expanding your skills....793
19 Advanced graphs....795
19.1 Modifying scales....797
19.1.1 Customizing axes....797
19.1.2 Customizing colors....807
19.2 Modifying themes....814
19.2.1 Prepackaged themes....816
19.2.2 Customizing fonts....818
19.2.3 Customizing legends....823
19.2.4 Customizing the plot area....826
19.3 Adding annotations....830
19.4 Combining graphs....840
19.5 Making graphs interactive....844
Summary....849
20 Advanced programming....850
20.1 A review of the language....851
20.1.1 Data types....852
20.1.2 Control structures....863
20.1.3 Creating functions....867
20.2 Working with environments....871
20.3 Non-standard evaluation....874
20.4 Object-oriented programming....878
20.4.1 Generic functions....879
20.4.2 Limitations of the S3 model....883
20.5 Writing efficient code....883
20.5.1 Efficient data input....884
20.5.2 Vectorization....885
20.5.3 Correctly sizing objects....887
20.5.4 Parallelization....888
20.6 Debugging....891
20.6.1 Common sources of errors....891
20.6.2 Debugging tools....893
20.6.3 Session options that support debugging....898
20.6.4 Using RStudios visual debugger....902
20.7 Going further....906
Summary....907
21 Creating dynamic reports....909
21.1 A template approach to reports....913
21.2 Creating a report with R and R Markdown....916
21.3 Creating a report with R and LaTeX....926
21.3.1 Creating a parameterized report....929
21.4 Avoiding common R Markdown problems....935
21.5 Going further....938
Summary....939
22 Creating a package....941
22.1 The edatools package....943
22.2 Creating a package....946
22.2.1 Installing development tools....947
22.2.2 Creating a package project....948
22.2.3 Writing the package functions....949
22.2.4 Adding function documentation....957
22.2.5 Adding a general help file (optional)....961
22.2.6 Adding sample data to the package (optional)....962
22.2.7 Adding a vignette (optional)....963
22.2.8 Editing the DESCRIPTION file....965
22.2.9 Building and installing the package....967
22.3 Sharing your package....973
22.3.1 Distributing a source package file....974
22.3.2 Submitting to CRAN....975
22.3.3 Hosting on GitHub....976
22.3.4 Creating a package website....980
22.4 Going further....982
Summary....983
Afterword. Into the rabbit hole....985
Appendix A. Graphical user interfaces....989
Appendix B. Customizing the startup environment....993
Appendix C. Exporting data from R....998
C.1 Delimited text file....998
C.2 Excel spreadsheet....999
C.3 Statistical applications....1000
Appendix D. Matrix algebra in R....1001
Appendix E. Packages used in this book....1005
Appendix F. Working with large datasets....1015
F.1 Efficient programming....1016
F.2 Storing data outside of RAM....1018
F.3 Analytic packages for out-of-memory data....1019
F.4 Comprehensive solutions for working with enormous datasets....1020
Appendix G. Updating an R installation....1026
G.1 Automated installation (Windows only)....1026
G.2 Manual installation (Windows and macOS)....1027
G.3 Updating an R installation (Linux)....1030
References....1031
index....1039
R in Action, Third Edition makes learning R quick and easy. That’s why thousands of data scientists have chosen this guide to help them master the powerful language. Far from being a dry academic tome, every example you’ll encounter in this book is relevant to scientific and business developers, and helps you solve common data challenges. R expert Rob Kabacoff takes you on a crash course in statistics, from dealing with messy and incomplete data to creating stunning visualizations. This revised and expanded third edition contains fresh coverage of the new tidyverse approach to data analysis and R’s state-of-the-art graphing capabilities with the ggplot2 package.
Used daily by data scientists, researchers, and quants of all types, R is the gold standard for statistical data analysis. This free and open source language includes packages for everything from advanced data visualization to deep learning. Instantly comfortable for mathematically minded users, R easily handles practical problems without forcing you to think like a software engineer.
R in Action, Third Edition teaches you how to do statistical analysis and data visualization using R and its popular tidyverse packages. In it, you’ll investigate real-world data challenges, including forecasting, data mining, and dynamic report writing. This revised third edition adds new coverage for graphing with ggplot2, along with examples for machine learning topics like clustering, classification, and time series analysis.
Requires basic math and statistics. No prior experience with R needed.
Фундаментальное обновлённое руководство по R, которое охватывает полный цикл анализа данных: от импорта и очистки данных до визуализации, статистического моделирования, машинного обучения и создания динамических отчётов. Ключевое отличие третьего издания — широкое использование tidyverse (dplyr, ggplot2, tidyr) и современных подходов.
Сильные стороны:
Минусы:
Итог: Настольная книга для тех, кто серьёзно работает с R — от студентов до практикующих аналитиков. Позволяет быстро перейти от теории к решению прикладных задач.