Data Clustering with Python: From Theory to Implementation

Data Clustering with Python: From Theory to Implementation

Data Clustering with Python: From Theory to Implementation
Автор: Gan Guojun
Дата выхода: 2026
Издательство: CRC Press is an imprint of Taylor & Francis Group, LLC
Количество страниц: 260
Размер файла: 4.3 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы  Дополнительные материалы 

Cover....1

Half Title....2

Series Page....3

Title Page....4

Copyright Page....5

Dedication....6

Contents....8

Preface....12

I. Python Programming Preliminaries....14

1. Python Programming 101....16

1.1. Installation....16

1.2. Variables and data types....20

1.3. Data structures....24

1.4. Operators....28

1.5. Control statements and loops....34

1.6. Functions....37

1.7. File IO....38

1.8. Error handling....41

1.9. Object-oriented programming....42

1.10. Code Optimization....43

1.11. Summary....45

2. The NumPy Library....46

2.1. Arrays....46

2.2. Array indexing and slicing....49

2.3. Views and copies....50

2.4. Array operations....52

2.5. Functions....53

2.6. Matrices....56

2.7. File IO....59

2.8. Code optimization....60

2.9. Summary....62

3. The Pandas Library....63

3.1. Pandas series....63

3.2. Pandas data frames....66

3.3. Views and copies....69

3.4. Data manipulation....71

3.5. File IO....76

3.6. Summary....78

4. The Matplotlib Library....79

4.1. Overview....79

4.2. Basic plotting....81

4.3. Subplots....84

4.4. File IO....86

4.5. Summary....86

II. Data Clustering in Python....88

5. Introduction to Data Clustering....90

5.1. History of Clustering....90

5.2. Data Clustering Process....92

5.3. Clusters....94

5.4. Data Types....95

5.5. Dissimilarity and Similarity Measures....97

5.5.1. Measures for Continuous Data....97

5.5.2. Measures for Discrete Data....98

5.5.3. Measures for Mixed-type Data....99

5.6. Hierarchical Clustering Algorithms....100

5.6.1. Agglomerative Hierarchical Algorithms....100

5.6.2. Divisive Hierarchical Algorithms....103

5.6.3. Other Hierarchical Algorithms....103

5.6.4. Dendrograms....103

5.7. Partitional Clustering Algorithms....104

5.7.1. Center-Based Clustering Algorithms....106

5.7.2. Search-based Clustering Algorithms....107

5.7.3. Graph-Based Clustering Algorithms....107

5.7.4. Grid-Based Clustering Algorithms....108

5.7.5. Density-Based Clustering Algorithms....108

5.7.6. Model-Based Clustering Algorithms....109

5.7.7. Subspace Clustering Algorithms....110

5.7.8. Neural Network-Based Clustering Algorithms....110

5.7.9. Fuzzy Clustering Algorithms....111

5.8. Cluster Validity....111

5.9. Clustering Applications....112

5.10. Literature on Data Clustering....112

5.11. Summary....116

6. Agglomerative Hierarchical Algorithms....117

6.1. Description of the Algorithm....117

6.2. Implementation....119

6.2.1. The Single Linkage Algorithm....121

6.2.2. The Complete Linkage Algorithm....122

6.2.3. The Group Average Algorithm....123

6.2.4. The Weighted Group Average Algorithm....124

6.2.5. The Centroid Algorithm....125

6.2.6. The Median Algorithm....126

6.2.7. Ward’s Algorithm....127

6.3. Examples....129

6.4. Summary....135

7. A Divisive Hierarchical Clustering Algorithm....137

7.1. Description of the Algorithm....137

7.2. Implementation....138

7.3. Examples....140

7.4. Summary....142

8. The k-means Algorithm....143

8.1. Description of the Algorithm....143

8.2. Implementation....144

8.3. Examples....146

8.4. Summary....150

9. The c-means Algorithm....152

9.1. Description of the Algorithm....152

9.2. Implementation....153

9.3. Examples....155

9.4. Summary....160

10. The k-prototypes Algorithm....161

10.1. Description of the Algorithm....161

10.2. Implementation....162

10.3. Examples....165

10.4. Summary....169

11. The Genetic k-modes Algorithm....170

11.1. Description of the Algorithm....170

11.2. Implementation....172

11.3. Examples....174

11.4. Summary....176

12. The FSC Algorithm....177

12.1. Description of the Algorithm....177

12.2. Implementation....179

12.3. Examples....181

12.4. Summary....185

13. The Gaussian Mixture Algorithm....187

13.1. Description of the Algorithm....187

13.2. Implementation....190

13.3. Examples....191

13.4. Summary....195

14. The KMTD Algorithm....196

14.1. Description of the Algorithm....196

14.2. Implementation....199

14.3. Examples....201

14.4. Summary....204

15. The Probability Propagation Algorithm....205

15.1. Description of the Algorithm....205

15.2. Implementation....207

15.3. Examples....208

15.4. Summary....212

16. A Spectral Clustering Algorithm....213

16.1. Description of the Algorithm....213

16.2. Implementation....214

16.3. Examples....215

16.4. Summary....222

17. A Mean-Shift Algorithm....223

17.1. Description of the Algorithm....223

17.2. Implementation....225

17.3. Examples....227

17.4. Summary....234

Bibliography....236

Index....258

Data clustering, an interdisciplinary field with diverse applications, has gained increasing popularity since its origins in the 1950s. Over the past six decades, researchers from various fields have proposed numerous clustering algorithms. In 2011, I wrote a book on implementing clustering algorithms in C++ using object-oriented programming. While C++ offers efficiency, its steep learning curve makes it less ideal for rapid prototyping. Since then, Python has surged in popularity, becoming the most widely used programming language since 2022. Its simplicity and extensive scientific libraries make it an excellent choice for implementing clustering algorithms.

Features:

  • Introduction to Python programming fundamentals
  • Overview of key concepts in data clustering
  • Implementation of popular clustering algorithms in Python
  • Practical examples of applying clustering algorithms to datasets
  • Access to associated Python code on GitHub

This book extends my previous work by implementing clustering algorithms in Python. Unlike the object-oriented approach in C++, this book uses a procedural programming style, as Python allows many clustering algorithms to be implemented concisely. The book is divided into two parts: the first introduces Python and key libraries like NumPy, Pandas, and Matplotlib, while the second covers clustering algorithms, including hierarchical and partitional methods. Each chapter includes theoretical explanations, Python implementations, and practical examples, with comparisons to scikit-learn where applicable.

This book is ideal for anyone interested in clustering algorithms, with no prior Python experience required.


Похожее:

Список отзывов:

Нет отзывов к книге.