Introduction....9
CHAPTER 1: Introducing Julia....11
How Julia Improves Data Science....14
Data science workflow....14
Julia’s adoption by the data science community....17
Julia Extensions....18
Package quality....18
Finding new packages....18
About the Book....19
CHAPTER 2: Setting Up the Data Science Lab....22
Julia IDEs....23
Juno....24
IJulia....26
Additional IDEs....28
Julia Packages....29
Finding and selecting packages....29
Installing packages....31
Using packages....32
Hacking packages....32
IJulia Basics....32
Handling files....32
Creating a notebook....32
Saving a notebook....33
Renaming a notebook....34
Loading a notebook....35
Exporting a notebook....36
Organizing code in .jl files....37
Referencing code....38
Working directory....38
Datasets We Will Use....39
Dataset descriptions....39
Magic dataset....39
OnlineNewsPopularity dataset....40
Spam Assassin dataset....41
Downloading datasets....42
Loading datasets....43
CSV files....43
Text files....43
Coding and Testing a Simple Machine Learning Algorithm in Julia....44
Algorithm description....45
Algorithm implementation....48
Algorithm testing....51
Saving Your Workspace into a Data File....54
Saving data into delimited files....54
Saving data into native Julia format....55
Saving data into text files....57
Help!....58
Summary....58
Chapter Challenge....60
CHAPTER 3: Learning the Ropes of Julia....62
Data Types....63
Arrays....68
Array basics....68
Accessing multiple elements in an array....70
Multidimensional arrays....71
Dictionaries....71
Basic Commands and Functions....72
print(), println()....73
typemax(), typemin()....73
collect()....74
show()....75
linspace()....75
Mathematical Functions....76
round()....76
rand(), randn()....77
sum()....80
mean()....81
Array and Dictionary Functions....81
in....81
append!()....82
pop!()....82
push!()....83
splice!()....84
insert!()....85
sort(), sort!()....85
get()....86
Keys(), values()....87
length(), size()....87
Miscellaneous Functions....88
time()....88
Conditionals....89
if-else statements....89
string()....91
map()....91
VERSION()....92
Operators, Loops and Conditionals....92
Operators....92
Alphanumeric operators (<, >, ==, <=, >=, !=)....92
Logical operators (&&, ||)....93
Loops....94
for-loops....94
while-loops....95
break command....96
Summary....96
Chapter Challenge....97
CHAPTER 4: Going Beyond the Basics in Julia....98
String Manipulation....99
split()....100
join()....101
Regex functions....101
ismatch()....103
match()....103
matchall()....104
eachmatch()....105
Custom Functions....106
Function structure....106
Anonymous functions....107
Multiple dispatch....107
Function example....108
Implementing a Simple Algorithm....110
Creating a Complete Solution....112
Summary....118
Chapter Challenge....119
CHAPTER 5: Julia Goes All Data Science-y....121
Data Science Pipeline....122
Data Engineering....125
Data preparation....125
Data exploration....127
Data representation....129
Data Modeling....131
Data discovery....131
Data learning....132
Information Distillation....135
Data product creation....135
Insight, deliverance, and visualization....136
Keep an Open Mind....137
Applying the Data Science Pipeline to a Real-World Problem....138
Data preparation....138
Data exploration....139
Data representation....140
Data discovery....140
Data learning....141
Data product creation....141
Insight, deliverance, and visualization....142
Summary....143
Chapter Challenge....144
CHAPTER 6: Julia the Data Engineer....146
Data Frames....148
Creating and populating a data frame....148
Data frames basics....149
Variable names in a data frame....149
Accessing particular variables in a data frame....150
Exploring a data frame....151
Filtering sections of a data frame....152
Applying functions to a data frame’s variables....152
Working with data frames....153
Altering data frames....155
Sorting the contents of a data frame....155
Data frame tips....156
Importing and Exporting Data....157
Accessing .json data files....157
Storing data in .json files....158
Loading data files into data frames....158
Saving data frames into data files....159
Cleaning Up Data....159
Cleaning up numeric data....159
Cleaning up text data....160
Formatting and Transforming Data....161
Formatting numeric data....161
Formatting text data....162
Importance of data types....163
Applying Data Transformations to Numeric Data....163
Normalization....164
Discretization (binning) and binarization....165
Binary to continuous (binary classification only)....167
Applying data transformations to text data....167
Case normalization....167
Vectorization....168
Preliminary Evaluation of Features....170
Regression....170
Classification....171
Feature evaluation tips....172
Summary....172
Chapter Challenge....173
CHAPTER 7: Exploring Datasets....175
Listening to the Data....176
Packages used in this chapter....176
Computing Basic Statistics and Correlations....177
Variable summary....178
Correlations among variables....179
Comparability between two variables....179
Plots....180
Grammar of graphics....180
Preparing data for visualization....180
Box plots....181
Bar plots....181
Line plots....182
Scatter plots....183
Basic scatter plots....183
Scatter plots using the output of t-SNE algorithm....184
Histograms....186
Exporting a plot to a file....186
Hypothesis Testing....187
Testing basics....187
Types of errors....187
Sensitivity and specificity....188
Significance and power of a test....189
Kruskal-Wallis tests....189
T-tests....190
Chi-square tests....191
Other Tests....193
Statistical Testing Tips....193
Case Study: Exploring the OnlineNewsPopularity Dataset....193
Variable stats....193
Visualization....194
Hypotheses....195
T-SNE magic....196
Conclusions....197
Summary....197
Chapter Challenge....199
CHAPTER 8: Manipulating the Fabric of the Data Space....200
Principal Components Analysis (PCA)....201
Applying PCA in Julia....202
Independent Components Analysis (ICA): most popular alternative of PCA....204
Feature Evaluation and Selection....205
Overview of the methodology....205
Using Julia for feature evaluation and selection using cosine similarity....207
Using Julia for feature evaluation and selection using DID....208
Pros and cons of the feature evaluation and selection approach....210
Other Dimensionality Reduction Techniques....211
Overview of the alternative dimensionality reduction methods....211
Genetic algorithms....211
Discernibility-based approach....212
When to use a sophisticated dimensionality reduction method....212
Summary....213
Chapter Challenge....213
CHAPTER 9: Sampling Data and Evaluating Results....215
Sampling Techniques....216
Basic sampling....217
Stratified sampling....217
Performance Metrics for Classification....218
Confusion matrix....219
Accuracy metrics....219
Basic accuracy....219
Weighted accuracy....220
Precision and recall metrics....222
F1 metric....222
Misclassification cost....223
Defining the cost matrix....223
Calculating the total misclassification cost....224
Receiver Operating Characteristic (ROC) Curve and related metrics....224
ROC Curve....224
AUC Metric....227
Gini Coefficient....228
Performance Metrics for Regression....228
MSE Metric and its variant, RMSE....229
SSE Metric....230
Other metrics....230
K-fold Cross Validation (KFCV)....231
Applying KFCV in Julia....232
KFCV tips....233
Summary....233
Chapter Challenge....236
CHAPTER 10: Unsupervised Machine Learning....238
Unsupervised Learning Basics....239
Clustering types....240
Distance metrics....241
Grouping Data with K-means....243
K-means using Julia....244
K-means tips....246
Density and the DBSCAN Approach....246
DBSCAN algorithm....247
Applying DBSCAN in Julia....248
Hierarchical Clustering....249
Applying hierarchical clustering in Julia....250
When to use hierarchical clustering....252
Validation Metrics for Clustering....252
Silhouettes....252
Clustering validation metrics tips....253
Effective Clustering Tips....254
Dealing with high dimensionality....254
Normalization....254
Visualization tips....255
Summary....255
Chapter Challenge....257
CHAPTER 11: Supervised Machine Learning....259
Decision Trees....261
Implementing decision trees in Julia....262
Decision tree tips....266
Regression Trees....267
Implementing regression trees in Julia....267
Regression tree tips....268
Random Forests....268
Implementing random forests in Julia for classification....269
Implementing random forests in Julia for regression....271
Random forest tips....272
Basic Neural Networks....273
Implementing neural networks in Julia....274
Neural network tips....277
Extreme Learning Machines....278
Implementing ELMs in Julia....279
ELM tips....281
Statistical Models for Regression Analysis....282
Implementing statistical regression in Julia....283
Statistical regression tips....286
Other Supervised Learning Systems....287
Boosted trees....287
Support vector machines....287
Transductive systems....287
Deep learning systems....288
Bayesian networks....289
Summary....290
Chapter Challenge....292
CHAPTER 12: Graph Analysis....294
Importance of Graphs....296
Custom Dataset....299
Statistics of a Graph....301
Cycle Detection....303
Julia the cycle detective....304
Connected Components....306
Cliques....307
Shortest Path in a Graph....308
Minimum Spanning Trees....311
Julia the MST botanist....312
Saving and loading graphs from a file....313
Graph Analysis and Julia’s Role in it....314
Summary....315
Chapter Challenge....318
CHAPTER 13: Reaching the Next Level....319
Julia Community....320
Sites to interact with other Julians....320
Code repositories....321
Videos....322
News....322
Practice What You’ve Learned....322
Some features to get you started....324
Some thoughts on this project....325
Final Thoughts about Your Experience with Julia in Data Science....326
Refining your Julia programming skills....326
Contributing to the Julia project....326
Future of Julia in data science....328
APPENDIX A: Downloading and Installing Julia and IJulia....330
APPENDIX B: Useful Websites Related to Julia....333
APPENDIX C: Packages Used in This Book....339
APPENDIX D: Bridging Julia with Other Platforms....344
Bridging Julia with R....345
Running a Julia script in R....345
Running an R script in Julia....346
Bridging Julia with Python....346
Running a Julia script in Python....347
Running a Python script in Julia....347
APPENDIX E: Parallelization in Julia....349
APPENDIX F: Answers to Chapter Challenges....353
Chapter 2....354
Chapter 3....357
Chapter 4....357
Chapter 5....359
Chapter 6....361
Chapter 7....363
Chapter 8....364
Chapter 9....365
Chapter 10....365
Chapter 11....367
Chapter 12....368
Chapter 13....369
Index....370
Master how to use the Julia language to solve business critical data science challenges. After covering the importance of Julia to the data science community and several essential data science principles, we start with the basics including how to install Julia and its powerful libraries. Many examples are provided as we illustrate how to leverage each Julia command, dataset, and function.
Specialized script packages are introduced and described. Hands-on problems representative of those commonly encountered throughout the data science pipeline are provided, and we guide you in the use of Julia in solving them using published datasets. Many of these scenarios make use of existing packages and built-in functions, as we cover:
An overview of the data science pipeline along with an example illustrating the key points, implemented in Julia:
Machine learning methods, ranging from unsupervised (different types of clustering) to supervised ones (decision trees, random forests, basic neural networks, regression trees, and Extreme Learning Machines)
Graph analysis including pinpointing the connections among the various entities and how they can be mined for useful insights.
Each chapter concludes with a series of questions and exercises to reinforce what you learned. The last chapter of the book will guide you in creating a data science application from scratch using Julia.