Data Without Labels: Practical unsupervised machine learning

Data Without Labels: Practical unsupervised machine learning

Data Without Labels: Practical unsupervised machine learning
Автор: Verdhan Vaibhav
Дата выхода: 2025
Издательство: Manning Publications Co.
Количество страниц: 354
Размер файла: 4.8 MB
Тип файла: PDF
Добавил: Aleks-5
 Проверить на вирусы  Дополнительные материалы 

Data Without Labels....1

Praise for Data Without Labels....3

brief contents....8

contents....9

foreword....16

preface....18

acknowledgments....20

about this book....22

Who should read this book....22

How this book is organized: A road map....23

About the code....23

liveBook discussion forum....23

about the author....25

about the cover illustration....26

Part 1 Basics....27

1 Introduction to machine learning....29

1.1 Technical toolkit....30

1.2 Data, data types, data management, and quality....31

1.2.1 What is data?....31

1.2.2 Various types of data....32

1.2.3 Data quality....35

1.2.4 Data engineering and management....37

1.3 Data analysis, ML, AI, and business intelligence....38

1.4 Nuts and bolts of ML....40

1.5 Types of ML algorithms....43

1.5.1 Supervised learning....44

1.5.2 Unsupervised algorithms....50

1.5.3 Semisupervised algorithms....54

1.5.4 Reinforcement learning....54

1.6 Concluding thoughts....55

Summary....56

2 Clustering techniques....58

2.1 Technical toolkit....59

2.2 Clustering....60

2.3 Centroid-based clustering....63

2.3.1 K-means clustering....65

2.3.2 Measuring the accuracy of clustering....68

2.3.3 Finding the optimum value of k....69

2.3.4 Pros and cons of k-means clustering....70

2.3.5 K-means clustering implementation using Python....72

2.4 Connectivity-based clustering....76

2.4.1 Types of hierarchical clustering....78

2.4.2 Linkage criterion for distance measurement....79

2.4.3 Optimal number of clusters....80

2.4.4 Pros and cons of hierarchical clustering....82

2.4.5 Hierarchical clustering case study using Python....83

2.5 Density-based clustering....86

2.5.1 Neighborhood and density....86

2.5.2 DBSCAN clustering....88

2.6 Case study using clustering....93

2.6.1 Business context....94

2.6.2 Dataset for the analysis....95

2.6.3 Suggested solutions....96

2.6.4 Solution for the problem....96

2.7 Common challenges faced in clustering....98

2.8 Concluding thoughts....100

2.9 Practical next steps and suggested readings....100

Summary....101

3 Dimensionality reduction....103

3.1 Technical toolkit....104

3.2 The curse of dimensionality....104

3.3 Dimension reduction methods....108

3.3.1 Mathematical foundation....108

3.4 Manual methods of dimensionality reduction....108

3.4.1 Manual feature selection....109

3.4.2 Correlation coefficient....110

3.4.3 Algorithm-based methods for reducing dimensions....111

3.5 Principal component analysis....111

3.5.1 Eigenvalue decomposition....116

3.5.2 Python solution using PCA....117

3.6 Singular value decomposition....123

3.6.1 Python solution using SVD....124

3.7 Pros and cons of dimensionality reduction....127

3.8 Case study for dimension reduction....129

3.9 Concluding thoughts....132

3.10 Practical next steps and suggested readings....132

Summary....133

Part 2 Intermediate level....135

4 Association rules....137

4.1 Technical toolkit....138

4.2 Association rule overview....138

4.3 The building blocks of association rules....140

4.3.1 Support, confidence, lift, and conviction....141

4.4 Apriori algorithm....145

4.4.1 Python implementation....147

4.4.2 Challenges with the Apriori algorithm....151

4.5 Equivalence class clustering and bottom-up lattice traversal....152

4.5.1 Python implementation....155

4.6 F-P algorithm....156

4.7 Sequence rule mining....163

4.7.1 Sequential Pattern Discovery Using Equivalence....164

4.8 Case study for association rules....168

4.9 Concluding thoughts....171

4.10 Practical next steps and suggested readings....173

Summary....173

5 Clustering....175

5.1 Technical toolkit....176

5.2 Clustering: A brief recap....176

5.3 Spectral clustering....177

5.3.1 Building blocks of spectral clustering....179

5.3.2 The process of spectral clustering....182

5.4 Python implementation of spectral clustering....184

5.5 Fuzzy clustering....186

5.5.1 Types of fuzzy clustering....187

5.5.2 Python implementation of FCM....190

5.6 Gaussian mixture model....193

5.6.1 EM technique....195

5.6.2 Python implementation of GMM....197

5.7 Concluding thoughts....200

5.8 Practical next steps and suggested readings....200

Summary....201

6 Dimensionality reduction....202

6.1 Technical toolkit....203

6.2 Multidimensional scaling....203

6.2.1 Classic MDS....205

6.2.2 Nonmetric MDS....206

6.3 Python implementation of MDS....210

6.4 t-distributed stochastic neighbor embedding....215

6.4.1 Cauchy distribution....217

6.4.2 Python implementation of t-SNE....219

6.5 Uniform manifold approximation projection....222

6.5.1 Working with UMAP....223

6.5.2 Using UMAP....223

6.5.3 Key points of UMAP....224

6.6 Case study....224

6.7 Concluding thoughts....226

6.8 Practical next steps and suggested readings....226

Summary....227

7 Unsupervised learning for text data....228

7.1 Technical toolkit....229

7.2 Text data is everywhere....229

7.3 Use cases of text data....230

7.4 Challenges with text data....231

7.5 Preprocessing the text data....233

7.6 Data cleaning....233

7.7 Extracting features from the text dataset....235

7.8 Tokenization....236

7.9 BOW approach....237

7.10 Term frequency and inverse document frequency....239

7.11 Language models....240

7.12 Text cleaning using Python....242

7.13 Word embeddings....245

7.14 Word2Vec and GloVe....247

7.15 Sentiment analysis case study with Python implementation....248

7.16 Text clustering using Python....254

7.17 GenAI for text data....256

7.18 Concluding thoughts....256

7.19 Practical next steps and suggested readings....257

Summary....258

Part 3 Advanced concepts....259

8 Deep learning: The foundational concepts....261

8.1 Technical toolkit....262

8.1.1 Deep learning: What is it? What does it do?....262

8.2 Building blocks of a neural network....264

8.2.1 Neural networks for solutions....264

8.2.2 Artificial neurons and perceptrons....265

8.2.3 Different layers in a network....267

8.2.4 Activation functions....269

8.2.5 Hyperparameters....271

8.2.6 Optimization functions....272

8.3 How does deep learning work in a supervised manner?....274

8.3.1 Supervised learning algorithms....274

8.3.2 Step 1: Feed-forward propagation....274

8.3.3 Step 2: Adding the loss function....275

8.3.4 Step 3: Calculating the error....276

8.4 Backpropagation....276

8.4.1 The mathematics behind backpropagation....277

8.4.2 Step 4: Optimization....279

8.5 How deep learning works in an unsupervised manner....279

8.6 Convolutional neural networks....280

8.6.1 Key concepts of CNN....280

8.6.2 Use of CNN....282

8.7 Recurrent neural networks....282

8.7.1 Key concepts of RNN....282

8.8 Boltzmann learning rule....284

8.8.1 Concepts of the Boltzmann learning rule....284

8.8.2 Key points....285

8.9 Deep belief networks....285

8.9.1 Key points of DBN....285

8.10 Popular deep learning libraries....287

8.10.1 Python code for Keras and TF....288

8.11 Concluding thoughts....289

8.12 Practical next steps and suggested readings....290

Summary....291

9 Autoencoders....293

9.1 Technical toolkit....293

9.2 Feature learning....294

9.3 Introducing autoencoders....294

9.4 Components of autoencoders....295

9.5 Training of autoencoders....296

9.6 Application of autoencoders....297

9.7 Types of autoencoders....297

9.8 Python implementation of autoencoders....301

9.9 Concluding thoughts....303

9.10 Practical next steps and suggested readings....303

Summary....304

10 Generative adversarial networks, generative AI, and ChatGPT....305

10.1 AI: A transformation....305

10.2 GenAI and its significance....306

10.3 Discriminative models and GenAI....308

10.4 Generative adversarial networks....309

10.4.1 The generator network....309

10.4.2 The discriminator network....310

10.4.3 Adversarial training....311

10.4.4 Variants and applications of GANs....312

10.4.5 BERT, GPT-3, and others....312

10.5 ChatGPT and its details....313

10.5.1 Key features of ChatGPT....313

10.5.2 Applications of ChatGPT....313

10.6 Integration of GenAI....314

10.7 Concluding thoughts....315

10.8 Practical next steps and suggested readings....316

Summary....316

11 End-to-end model deployment....317

11.1 The machine learning modeling process....318

11.2 Business problem definition....318

11.3 Data discovery and feasibility analysis....320

11.4 Data cleaning and prepreparation....321

11.5 Duplicate values in the data....321

11.6 Categorical variables....322

11.7 Missing values in dataset....323

11.8 Outliers present in the data....325

11.9 Exploratory data analysis....325

11.10 Model development and business approval....326

11.11 Model deployment....326

11.12 Purpose of model deployment....326

11.13 Types of model deployment....327

11.14 Considerations while deploying the model....328

11.15 Documentation....329

11.16 Model maintenance and refresh....329

11.17 Concluding thoughts....330

11.18 Practical next steps and suggested readings....331

Summary....331

appendix A Mathematical foundations....333

A.1 List of clustering algorithms....333

A.1.1 Partitioning-based algorithms....333

A.1.2 Hierarchical clustering....333

A.1.3 Density-based algorithms....333

A.1.4 Grid-based algorithms....334

A.1.5 Model-based algorithms....334

A.1.6 Spectral clustering....334

A.1.7 Graph-based clustering....334

A.1.8 Subspace and high-dimensional clustering....335

A.1.9 Fuzzy and soft clustering....335

A.1.10 Constraint-based clustering....335

A.1.11 Evolutionary and genetic clustering....335

A.1.12 Neural network-based clustering....336

A.1.13 Other algorithms....336

A.2 What is a centroid?....336

A.3 L1 vs. L2 norm....336

A.4 Different scaling techniques used in the industry....336

A.5 Time complexity O(n)....337

A.6 How to install packages in Python....338

A.7 Correlation....338

A.7.1 Correlation coefficient....339

A.7.2 Uses of correlation....339

A.7.3 Important considerations....339

A.8 Time-series analysis....340

A.9 Mathematical foundation for data representation....340

A.9.1 Scalar and vector....341

A.9.2 Standard deviation and variance....341

A.9.3 Covariance and correlation....342

A.9.4 Matrix decomposition, eigenvectors, and eigenvalues....343

A.9.5 Special matrices....344

A.10 Hyperparameters vs. parameters....344

index....345

A....345

B....345

C....346

D....346

E....347

F....347

G....348

H....348

I....348

J....348

K....349

L....349

M....349

N....350

O....350

P....350

Q....351

R....351

S....351

T....351

U....352

V....352

W....352

X....352

Y....352

Z....352

Data Without Labels - back....354

Discover all-practical implementations of the key algorithms and models for handling unlabeled data. Full of case studies demonstrating how to apply each technique to real-world problems.

In Data Without Labels you’ll learn:

  • Fundamental building blocks and concepts of machine learning and unsupervised learning
  • Data cleaning for structured and unstructured data like text and images
  • Clustering algorithms like K-means, hierarchical clustering, DBSCAN, Gaussian Mixture Models, and Spectral clustering
  • Dimensionality reduction methods like Principal Component Analysis (PCA), SVD, Multidimensional scaling, and t-SNE
  • Association rule algorithms like aPriori, ECLAT, SPADE
  • Unsupervised time series clustering, Gaussian Mixture models, and statistical methods
  • Building neural networks such as GANs and autoencoders
  • Dimensionality reduction methods like Principal Component Analysis and multidimensional scaling
  • Association rule algorithms like aPriori, ECLAT, and SPADE
  • Working with Python tools and libraries like sci-kit learn, numpy, Pandas, matplotlib, Seaborn, Keras, TensorFlow, and Flask
  • How to interpret the results of unsupervised learning
  • Choosing the right algorithm for your problem
  • Deploying unsupervised learning to production
  • Maintenance and refresh of an ML solution

Data Without Labels introduces mathematical techniques, key algorithms, and Python implementations that will help you build machine learning models for unannotated data. You’ll discover hands-off and unsupervised machine learning approaches that can still untangle raw, real-world datasets and support sound strategic decisions for your business.

Don’t get bogged down in theory—the book bridges the gap between complex math and practical Python implementations, covering end-to-end model development all the way through to production deployment. You’ll discover the business use cases for machine learning and unsupervised learning, and access insightful research papers to complete your knowledge.

about the technology

Generative AI, predictive algorithms, fraud detection, and many other analysis tasks rely on cheap and plentiful unlabeled data. Machine learning on data without labels—or unsupervised learning—turns raw text, images, and numbers into insights about your customers, accurate computer vision, and high-quality datasets for training AI models. This book will show you how.

about the book

Data Without Labels is a comprehensive guide to unsupervised learning, offering a deep dive into its mathematical foundations, algorithms, and practical applications. It presents practical examples from retail, aviation, and banking using fully annotated Python code. You’ll explore core techniques like clustering and dimensionality reduction along with advanced topics like autoencoders and GANs. As you go, you’ll learn where to apply unsupervised learning in business applications and discover how to develop your own machine learning models end-to-end.

what's inside

  • Master unsupervised learning algorithms
  • Real-world business applications
  • Curate AI training datasets
  • Explore autoencoders and GANs applications

about the reader

Intended for data science professionals. Assumes knowledge of Python and basic machine learning.


Похожее:

Список отзывов:

Нет отзывов к книге.