Julia for Data Science

Julia for Data Science

Julia for Data Science
Автор: Voulgaris Zacharias
Дата выхода: 2016
Издательство: Technics Publications
Количество страниц: 384
Размер файла: 1.9 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Introduction....9

CHAPTER 1: Introducing Julia....11

How Julia Improves Data Science....14

Data science workflow....14

Julia’s adoption by the data science community....17

Julia Extensions....18

Package quality....18

Finding new packages....18

About the Book....19

CHAPTER 2: Setting Up the Data Science Lab....22

Julia IDEs....23

Juno....24

IJulia....26

Additional IDEs....28

Julia Packages....29

Finding and selecting packages....29

Installing packages....31

Using packages....32

Hacking packages....32

IJulia Basics....32

Handling files....32

Creating a notebook....32

Saving a notebook....33

Renaming a notebook....34

Loading a notebook....35

Exporting a notebook....36

Organizing code in .jl files....37

Referencing code....38

Working directory....38

Datasets We Will Use....39

Dataset descriptions....39

Magic dataset....39

OnlineNewsPopularity dataset....40

Spam Assassin dataset....41

Downloading datasets....42

Loading datasets....43

CSV files....43

Text files....43

Coding and Testing a Simple Machine Learning Algorithm in Julia....44

Algorithm description....45

Algorithm implementation....48

Algorithm testing....51

Saving Your Workspace into a Data File....54

Saving data into delimited files....54

Saving data into native Julia format....55

Saving data into text files....57

Help!....58

Summary....58

Chapter Challenge....60

CHAPTER 3: Learning the Ropes of Julia....62

Data Types....63

Arrays....68

Array basics....68

Accessing multiple elements in an array....70

Multidimensional arrays....71

Dictionaries....71

Basic Commands and Functions....72

print(), println()....73

typemax(), typemin()....73

collect()....74

show()....75

linspace()....75

Mathematical Functions....76

round()....76

rand(), randn()....77

sum()....80

mean()....81

Array and Dictionary Functions....81

in....81

append!()....82

pop!()....82

push!()....83

splice!()....84

insert!()....85

sort(), sort!()....85

get()....86

Keys(), values()....87

length(), size()....87

Miscellaneous Functions....88

time()....88

Conditionals....89

if-else statements....89

string()....91

map()....91

VERSION()....92

Operators, Loops and Conditionals....92

Operators....92

Alphanumeric operators (<, >, ==, <=, >=, !=)....92

Logical operators (&&, ||)....93

Loops....94

for-loops....94

while-loops....95

break command....96

Summary....96

Chapter Challenge....97

CHAPTER 4: Going Beyond the Basics in Julia....98

String Manipulation....99

split()....100

join()....101

Regex functions....101

ismatch()....103

match()....103

matchall()....104

eachmatch()....105

Custom Functions....106

Function structure....106

Anonymous functions....107

Multiple dispatch....107

Function example....108

Implementing a Simple Algorithm....110

Creating a Complete Solution....112

Summary....118

Chapter Challenge....119

CHAPTER 5: Julia Goes All Data Science-y....121

Data Science Pipeline....122

Data Engineering....125

Data preparation....125

Data exploration....127

Data representation....129

Data Modeling....131

Data discovery....131

Data learning....132

Information Distillation....135

Data product creation....135

Insight, deliverance, and visualization....136

Keep an Open Mind....137

Applying the Data Science Pipeline to a Real-World Problem....138

Data preparation....138

Data exploration....139

Data representation....140

Data discovery....140

Data learning....141

Data product creation....141

Insight, deliverance, and visualization....142

Summary....143

Chapter Challenge....144

CHAPTER 6: Julia the Data Engineer....146

Data Frames....148

Creating and populating a data frame....148

Data frames basics....149

Variable names in a data frame....149

Accessing particular variables in a data frame....150

Exploring a data frame....151

Filtering sections of a data frame....152

Applying functions to a data frame’s variables....152

Working with data frames....153

Altering data frames....155

Sorting the contents of a data frame....155

Data frame tips....156

Importing and Exporting Data....157

Accessing .json data files....157

Storing data in .json files....158

Loading data files into data frames....158

Saving data frames into data files....159

Cleaning Up Data....159

Cleaning up numeric data....159

Cleaning up text data....160

Formatting and Transforming Data....161

Formatting numeric data....161

Formatting text data....162

Importance of data types....163

Applying Data Transformations to Numeric Data....163

Normalization....164

Discretization (binning) and binarization....165

Binary to continuous (binary classification only)....167

Applying data transformations to text data....167

Case normalization....167

Vectorization....168

Preliminary Evaluation of Features....170

Regression....170

Classification....171

Feature evaluation tips....172

Summary....172

Chapter Challenge....173

CHAPTER 7: Exploring Datasets....175

Listening to the Data....176

Packages used in this chapter....176

Computing Basic Statistics and Correlations....177

Variable summary....178

Correlations among variables....179

Comparability between two variables....179

Plots....180

Grammar of graphics....180

Preparing data for visualization....180

Box plots....181

Bar plots....181

Line plots....182

Scatter plots....183

Basic scatter plots....183

Scatter plots using the output of t-SNE algorithm....184

Histograms....186

Exporting a plot to a file....186

Hypothesis Testing....187

Testing basics....187

Types of errors....187

Sensitivity and specificity....188

Significance and power of a test....189

Kruskal-Wallis tests....189

T-tests....190

Chi-square tests....191

Other Tests....193

Statistical Testing Tips....193

Case Study: Exploring the OnlineNewsPopularity Dataset....193

Variable stats....193

Visualization....194

Hypotheses....195

T-SNE magic....196

Conclusions....197

Summary....197

Chapter Challenge....199

CHAPTER 8: Manipulating the Fabric of the Data Space....200

Principal Components Analysis (PCA)....201

Applying PCA in Julia....202

Independent Components Analysis (ICA): most popular alternative of PCA....204

Feature Evaluation and Selection....205

Overview of the methodology....205

Using Julia for feature evaluation and selection using cosine similarity....207

Using Julia for feature evaluation and selection using DID....208

Pros and cons of the feature evaluation and selection approach....210

Other Dimensionality Reduction Techniques....211

Overview of the alternative dimensionality reduction methods....211

Genetic algorithms....211

Discernibility-based approach....212

When to use a sophisticated dimensionality reduction method....212

Summary....213

Chapter Challenge....213

CHAPTER 9: Sampling Data and Evaluating Results....215

Sampling Techniques....216

Basic sampling....217

Stratified sampling....217

Performance Metrics for Classification....218

Confusion matrix....219

Accuracy metrics....219

Basic accuracy....219

Weighted accuracy....220

Precision and recall metrics....222

F1 metric....222

Misclassification cost....223

Defining the cost matrix....223

Calculating the total misclassification cost....224

Receiver Operating Characteristic (ROC) Curve and related metrics....224

ROC Curve....224

AUC Metric....227

Gini Coefficient....228

Performance Metrics for Regression....228

MSE Metric and its variant, RMSE....229

SSE Metric....230

Other metrics....230

K-fold Cross Validation (KFCV)....231

Applying KFCV in Julia....232

KFCV tips....233

Summary....233

Chapter Challenge....236

CHAPTER 10: Unsupervised Machine Learning....238

Unsupervised Learning Basics....239

Clustering types....240

Distance metrics....241

Grouping Data with K-means....243

K-means using Julia....244

K-means tips....246

Density and the DBSCAN Approach....246

DBSCAN algorithm....247

Applying DBSCAN in Julia....248

Hierarchical Clustering....249

Applying hierarchical clustering in Julia....250

When to use hierarchical clustering....252

Validation Metrics for Clustering....252

Silhouettes....252

Clustering validation metrics tips....253

Effective Clustering Tips....254

Dealing with high dimensionality....254

Normalization....254

Visualization tips....255

Summary....255

Chapter Challenge....257

CHAPTER 11: Supervised Machine Learning....259

Decision Trees....261

Implementing decision trees in Julia....262

Decision tree tips....266

Regression Trees....267

Implementing regression trees in Julia....267

Regression tree tips....268

Random Forests....268

Implementing random forests in Julia for classification....269

Implementing random forests in Julia for regression....271

Random forest tips....272

Basic Neural Networks....273

Implementing neural networks in Julia....274

Neural network tips....277

Extreme Learning Machines....278

Implementing ELMs in Julia....279

ELM tips....281

Statistical Models for Regression Analysis....282

Implementing statistical regression in Julia....283

Statistical regression tips....286

Other Supervised Learning Systems....287

Boosted trees....287

Support vector machines....287

Transductive systems....287

Deep learning systems....288

Bayesian networks....289

Summary....290

Chapter Challenge....292

CHAPTER 12: Graph Analysis....294

Importance of Graphs....296

Custom Dataset....299

Statistics of a Graph....301

Cycle Detection....303

Julia the cycle detective....304

Connected Components....306

Cliques....307

Shortest Path in a Graph....308

Minimum Spanning Trees....311

Julia the MST botanist....312

Saving and loading graphs from a file....313

Graph Analysis and Julia’s Role in it....314

Summary....315

Chapter Challenge....318

CHAPTER 13: Reaching the Next Level....319

Julia Community....320

Sites to interact with other Julians....320

Code repositories....321

Videos....322

News....322

Practice What You’ve Learned....322

Some features to get you started....324

Some thoughts on this project....325

Final Thoughts about Your Experience with Julia in Data Science....326

Refining your Julia programming skills....326

Contributing to the Julia project....326

Future of Julia in data science....328

APPENDIX A: Downloading and Installing Julia and IJulia....330

APPENDIX B: Useful Websites Related to Julia....333

APPENDIX C: Packages Used in This Book....339

APPENDIX D: Bridging Julia with Other Platforms....344

Bridging Julia with R....345

Running a Julia script in R....345

Running an R script in Julia....346

Bridging Julia with Python....346

Running a Julia script in Python....347

Running a Python script in Julia....347

APPENDIX E: Parallelization in Julia....349

APPENDIX F: Answers to Chapter Challenges....353

Chapter 2....354

Chapter 3....357

Chapter 4....357

Chapter 5....359

Chapter 6....361

Chapter 7....363

Chapter 8....364

Chapter 9....365

Chapter 10....365

Chapter 11....367

Chapter 12....368

Chapter 13....369

Index....370

Master how to use the Julia language to solve business critical data science challenges. After covering the importance of Julia to the data science community and several essential data science principles, we start with the basics including how to install Julia and its powerful libraries. Many examples are provided as we illustrate how to leverage each Julia command, dataset, and function.

Specialized script packages are introduced and described. Hands-on problems representative of those commonly encountered throughout the data science pipeline are provided, and we guide you in the use of Julia in solving them using published datasets. Many of these scenarios make use of existing packages and built-in functions, as we cover:

An overview of the data science pipeline along with an example illustrating the key points, implemented in Julia:

  • Options for Julia IDEs
  • Programming structures and functions
  • Engineering tasks, such as importing, cleaning, formatting and storing data, as well as performing data preprocessing
  • Data visualization and some simple yet powerful statistics for data exploration purposes
  • Dimensionality reduction and feature evaluation

Machine learning methods, ranging from unsupervised (different types of clustering) to supervised ones (decision trees, random forests, basic neural networks, regression trees, and Extreme Learning Machines)

Graph analysis including pinpointing the connections among the various entities and how they can be mined for useful insights.

Each chapter concludes with a series of questions and exercises to reinforce what you learned. The last chapter of the book will guide you in creating a data science application from scratch using Julia.


Похожее:

Список отзывов:

Нет отзывов к книге.