Cover....1
Half Title....2
Series Page....3
Title Page....4
Copyright Page....5
Dedication....6
Contents....8
Preface....12
I. Python Programming Preliminaries....14
1. Python Programming 101....16
1.1. Installation....16
1.2. Variables and data types....20
1.3. Data structures....24
1.4. Operators....28
1.5. Control statements and loops....34
1.6. Functions....37
1.7. File IO....38
1.8. Error handling....41
1.9. Object-oriented programming....42
1.10. Code Optimization....43
1.11. Summary....45
2. The NumPy Library....46
2.1. Arrays....46
2.2. Array indexing and slicing....49
2.3. Views and copies....50
2.4. Array operations....52
2.5. Functions....53
2.6. Matrices....56
2.7. File IO....59
2.8. Code optimization....60
2.9. Summary....62
3. The Pandas Library....63
3.1. Pandas series....63
3.2. Pandas data frames....66
3.3. Views and copies....69
3.4. Data manipulation....71
3.5. File IO....76
3.6. Summary....78
4. The Matplotlib Library....79
4.1. Overview....79
4.2. Basic plotting....81
4.3. Subplots....84
4.4. File IO....86
4.5. Summary....86
II. Data Clustering in Python....88
5. Introduction to Data Clustering....90
5.1. History of Clustering....90
5.2. Data Clustering Process....92
5.3. Clusters....94
5.4. Data Types....95
5.5. Dissimilarity and Similarity Measures....97
5.5.1. Measures for Continuous Data....97
5.5.2. Measures for Discrete Data....98
5.5.3. Measures for Mixed-type Data....99
5.6. Hierarchical Clustering Algorithms....100
5.6.1. Agglomerative Hierarchical Algorithms....100
5.6.2. Divisive Hierarchical Algorithms....103
5.6.3. Other Hierarchical Algorithms....103
5.6.4. Dendrograms....103
5.7. Partitional Clustering Algorithms....104
5.7.1. Center-Based Clustering Algorithms....106
5.7.2. Search-based Clustering Algorithms....107
5.7.3. Graph-Based Clustering Algorithms....107
5.7.4. Grid-Based Clustering Algorithms....108
5.7.5. Density-Based Clustering Algorithms....108
5.7.6. Model-Based Clustering Algorithms....109
5.7.7. Subspace Clustering Algorithms....110
5.7.8. Neural Network-Based Clustering Algorithms....110
5.7.9. Fuzzy Clustering Algorithms....111
5.8. Cluster Validity....111
5.9. Clustering Applications....112
5.10. Literature on Data Clustering....112
5.11. Summary....116
6. Agglomerative Hierarchical Algorithms....117
6.1. Description of the Algorithm....117
6.2. Implementation....119
6.2.1. The Single Linkage Algorithm....121
6.2.2. The Complete Linkage Algorithm....122
6.2.3. The Group Average Algorithm....123
6.2.4. The Weighted Group Average Algorithm....124
6.2.5. The Centroid Algorithm....125
6.2.6. The Median Algorithm....126
6.2.7. Ward’s Algorithm....127
6.3. Examples....129
6.4. Summary....135
7. A Divisive Hierarchical Clustering Algorithm....137
7.1. Description of the Algorithm....137
7.2. Implementation....138
7.3. Examples....140
7.4. Summary....142
8. The k-means Algorithm....143
8.1. Description of the Algorithm....143
8.2. Implementation....144
8.3. Examples....146
8.4. Summary....150
9. The c-means Algorithm....152
9.1. Description of the Algorithm....152
9.2. Implementation....153
9.3. Examples....155
9.4. Summary....160
10. The k-prototypes Algorithm....161
10.1. Description of the Algorithm....161
10.2. Implementation....162
10.3. Examples....165
10.4. Summary....169
11. The Genetic k-modes Algorithm....170
11.1. Description of the Algorithm....170
11.2. Implementation....172
11.3. Examples....174
11.4. Summary....176
12. The FSC Algorithm....177
12.1. Description of the Algorithm....177
12.2. Implementation....179
12.3. Examples....181
12.4. Summary....185
13. The Gaussian Mixture Algorithm....187
13.1. Description of the Algorithm....187
13.2. Implementation....190
13.3. Examples....191
13.4. Summary....195
14. The KMTD Algorithm....196
14.1. Description of the Algorithm....196
14.2. Implementation....199
14.3. Examples....201
14.4. Summary....204
15. The Probability Propagation Algorithm....205
15.1. Description of the Algorithm....205
15.2. Implementation....207
15.3. Examples....208
15.4. Summary....212
16. A Spectral Clustering Algorithm....213
16.1. Description of the Algorithm....213
16.2. Implementation....214
16.3. Examples....215
16.4. Summary....222
17. A Mean-Shift Algorithm....223
17.1. Description of the Algorithm....223
17.2. Implementation....225
17.3. Examples....227
17.4. Summary....234
Bibliography....236
Index....258
Data clustering, an interdisciplinary field with diverse applications, has gained increasing popularity since its origins in the 1950s. Over the past six decades, researchers from various fields have proposed numerous clustering algorithms. In 2011, I wrote a book on implementing clustering algorithms in C++ using object-oriented programming. While C++ offers efficiency, its steep learning curve makes it less ideal for rapid prototyping. Since then, Python has surged in popularity, becoming the most widely used programming language since 2022. Its simplicity and extensive scientific libraries make it an excellent choice for implementing clustering algorithms.
This book extends my previous work by implementing clustering algorithms in Python. Unlike the object-oriented approach in C++, this book uses a procedural programming style, as Python allows many clustering algorithms to be implemented concisely. The book is divided into two parts: the first introduces Python and key libraries like NumPy, Pandas, and Matplotlib, while the second covers clustering algorithms, including hierarchical and partitional methods. Each chapter includes theoretical explanations, Python implementations, and practical examples, with comparisons to scikit-learn where applicable.
This book is ideal for anyone interested in clustering algorithms, with no prior Python experience required.