Preface....5
About this Book....7
Contents....8
1: Introduction to Python....18
1.1 Overview of Python Programming....19
1.1.1 Python Is an Interpreted Programming Language....20
1.1.2 The Evolution of Python Programming....22
1.2 Importance of Python in Biomedical Data Analysis....23
1.2.1 Python in Genomic Data Analysis....23
1.2.2 Python in Drug Discovery and Development....24
1.2.3 Python in Clinical Data Analysis....24
1.2.4 Python in Image Analysis....24
1.3 The Python Environment....24
1.3.1 Key Characteristics of a Python Environment....25
1.3.2 Python Installation....25
1.3.3 Integrated Development Environments (IDEs) for Python....26
1.3.4 Virtual Environments for Python....26
1.3.5 Creating a Virtual Environment Using venv....28
1.3.6 Creating a Virtual Environment Using Anaconda....29
1.3.7 Installing Python Libraries....31
1.3.8 Verifying the Setup....32
1.3.9 Troubleshooting Common Setup Issues....33
1.4 Exercises and Questions....34
References....35
2: Python Basics....36
2.1 Basic Syntax and Operations....38
2.1.1 Printing Output....38
2.1.2 Comments....40
2.1.3 Arithmetic Operations....41
2.1.4 Assignment Operators....43
2.1.5 Comparison Operators....45
2.2 Variables and Data Types....47
2.2.1 Introduction to Variables....48
2.2.2 Basic Data Types....50
2.2.3 Basics of String Manipulations....50
2.2.4 Boolean Expressions and Logical Operators....58
2.2.5 Type Conversion....60
2.2.6 Complex Data Types....61
2.2.7 Lists....62
2.2.8 List Operations....63
2.2.9 Slicing a List....66
2.2.10 Iterating Over Lists....67
2.2.11 Tuples....69
2.2.12 Sets....71
2.2.13 Basic Set Operations....72
2.2.14 Methods to Modify Sets....74
2.2.15 Dictionaries....81
2.2.16 Dictionary Methods....82
2.2.17 Iterating over Dictionaries....85
2.3 Control Structures: Loops and Conditionals....86
2.3.1 Introduction to Control Structures....87
2.3.2 Usage of Conditional Statements....88
2.3.3 Looping Structures....90
2.3.4 Use of Control Flow Tools....95
2.3.5 Comprehensions....98
2.3.6 Error Handling in Control Structures....103
2.4 Exercises and Questions....108
References....110
3: Working with Biomedical Data: Basic Data Handling....111
3.1 Overview of Biomedical Data Types....112
3.2 Data Handling Challenges....113
3.3 Understanding Data Formats....114
3.3.1 Common Biomedical Data Formats....114
3.3.2 Proprietary Formats....115
3.4 Data Import Techniques....116
3.4.1 Reading Flat Files....116
3.4.2 Working with Hierarchical Data....118
3.4.3 Importing Large Datasets....120
3.4.4 Importing Medical Imaging Data....123
3.4.5 Reading Genomic Data....126
3.4.6 Reading Transcriptomic Data....127
3.5 Data Export Techniques....128
3.5.1 Exporting to Flat Files....129
3.5.2 Exporting Hierarchical Data....130
3.5.3 Handling Large-Scale Data Export....131
3.5.4 Exporting Processed Imaging Data....132
3.5.5 Exporting Genomic Data....134
3.6 Exercises and Questions....135
References....136
4: Biomedical Data Preprocessing....137
4.1 Overview of Common Issues with Biomedical Datasets....139
4.1.1 Complexity and Heterogeneity....139
4.1.2 Missing and Incomplete Data....140
4.1.3 Outliers and Noise....141
4.1.4 Inconsistencies and Errors....142
4.2 Understanding Data Quality....143
4.2.1 Defining Data Quality in a Biomedical Context....143
4.2.2 Common Data Quality Issues....145
4.2.3 Addressing Data Quality Issues....146
4.3 Handling Missing Data....147
4.3.1 Type of Missing Data....147
4.3.2 Identifying Missing Data....148
4.3.3 Techniques for Handling Missing Data....150
4.3.4 Impact of Missing Data on Biomedical Analysis....161
4.4 Data Transformation....162
4.4.1 Normalization and Standardization....163
4.4.2 Encoding Categorical Variables....166
4.4.3 Handling Data Distribution....170
4.4.4 Feature Creation....175
4.4.5 Binning and Discretization....177
4.4.6 Dimensionality Reduction....179
4.4.7 Time-Series and Frequency Transformation....185
4.5 Dealing with Outliers....189
4.5.1 Outlier Detection Methods....189
4.5.2 Statistical Methods for Outlier Detection....189
4.5.3 Distance-Based Methods for Outlier Detection....195
4.5.4 Model-Based Methods for Outlier Detection....197
4.5.5 Cluster-Based Methods for Outlier Detection....200
4.5.6 Proximity-Based Methods for Outlier Detection....202
4.5.7 Strategies to Manage Outliers....203
4.5.8 Consequences of Not Handling Outliers....205
4.6 Data Integration....205
4.6.1 Importance....206
4.6.2 Challenges....206
4.6.3 Methods....208
4.6.4 Addressing Inconsistencies and Duplicate Data....210
4.6.5 Metadata Management....212
4.7 Ensuring Data Privacy and Security....213
4.7.1 Data Privacy Considerations....213
4.7.2 Data Security Measures....214
4.7.3 Anonymization Techniques....214
4.8 Exercises and Questions....215
References....217
5: Basic Biomedical Data Exploration Techniques....228
5.1 Data Exploration....230
5.1.1 Data Exploration in Biomedical Research....230
5.1.2 The Data Exploration Process....231
5.2 Understanding the Dataset....233
5.2.1 Identifying the Type of Data: Quantitative Versus Qualitative....233
5.2.2 Basic Dataset Properties: Shape, Size, and Dimensionality....234
5.2.3 Recognizing Different Types of Biomedical Datasets....235
5.3 Data Inspection and Cleaning....236
5.3.1 Basic Data Inspection: Head, Tail, and Random Samples....237
5.3.2 Identifying and Handling Missing Values....238
5.3.3 Detecting and Correcting Errors or Anomalies in the Data....238
5.3.4 Data Type Conversions....239
5.4 Dealing with Categorical Data....240
5.4.1 Understanding Categorical Data....240
5.4.2 Tools for Summarizing and Analyzing Categorical Data....240
5.4.3 Importance of Categorical Data in Biomedical Context....241
5.4.4 Challenges in Handling Categorical Data....241
5.4.5 Significance in Drug Discovery and Sensitivity Prediction....242
5.5 Initial Data Exploration in Practice....242
5.5.1 Systematic Approach to Data Exploration....242
5.5.2 Common Pitfalls....243
5.5.3 Transitioning to Advanced Analysis....243
5.5.4 Role of Initial Exploration in Guiding Research....244
5.6 Preparing for Advanced Analysis....244
5.6.1 Summarizing Insights from Basic Exploration....244
5.6.2 Identifying Areas for Deeper Analysis....245
5.6.3 Formulating Hypotheses....245
5.6.4 Transition to Advanced Statistical Methods and Machine Learning....245
5.6.5 Conceptual and Technical Preparation....246
5.6.6 Embracing Opportunities in Advanced Data Analysis....246
5.7 Exercises and Questions....246
References....248
6: Data Visualization in Biomedicine....250
6.1 The Role of Data Visualization in Biomedicine....251
6.2 Basic Data Visualization Tools in Python....252
6.2.1 Matplotlib....253
6.2.2 Seaborn....253
6.2.3 Plotly and Bokeh....253
6.3 Creating Basic Plots....254
6.3.1 Line Graphs....254
6.3.2 Histograms....256
6.3.3 Scatter Plots....259
6.3.4 Box Plots....262
6.3.5 Correlation Matrices, Heatmaps, and Clustering....264
6.3.6 Bar Graphs....267
6.3.7 Pie Plots....269
6.4 Complexities in Visualization of Biomedical Data....271
6.4.1 Effective Data Handling....272
6.4.2 Acceptability of Visualization....272
6.4.3 Confidentiality with Sensitive Data....272
6.5 Data Visualization in Biomedical Communication....273
6.5.1 Presenting Data to Nonexperts....273
6.5.2 Communication and Documentation....274
6.5.3 Simplifying Clinical Discussions....275
6.6 Exercises and Questions....275
References....276
7: Statistical Analysis in Biomedicine....277
7.1 Statistics in Biomedical Data Analysis....278
7.1.1 Overview of Biomedical Data Characteristics....278
7.1.2 Importance of Statistical Analysis in Biomedical Research....279
7.1.3 Challenges in Biomedical Data Analysis....279
7.2 Fundamental Statistical Concepts....279
7.2.1 Descriptive Statistics....280
7.2.2 Probability Distributions....284
7.2.3 Hypothesis Testing....288
7.2.4 Confidence Intervals....290
7.3 Statistics in Exploratory Data Analysis....291
7.3.1 Descriptive Statistics and Data Visualization Techniques in EDA....291
7.3.2 Identifying Patterns and Anomalies....292
7.3.3 Preliminary Data Screening Methods....293
7.4 Advanced Statistical Methods....294
7.4.1 Regression Analysis....294
7.4.2 Multivariate Analysis....296
7.4.3 Survival Analysis....297
7.5 Statistical Methods in Genomics and Proteomics....299
7.5.1 Analysis of Gene Expression Data....299
7.5.2 Proteomic Data Analysis....301
7.5.3 Bioinformatics Tools for Statistical Analysis....302
7.6 Exercises and Questions....303
References....304
8: Machine Learning in Biomedicine....306
8.1 Basic Concepts of Machine Learning....308
8.1.1 Machine Learning: Brief Classifications....309
8.2 Supervised Learning....310
8.2.1 Linear Regression....311
8.2.2 Logistic Regression....313
8.2.3 Linear Discriminant Analysis (LDA)....314
8.2.4 Decision Trees....315
8.2.5 Ensemble Methods....317
8.2.6 Support Vector Machines....319
8.2.7 Naive Bayes....320
8.2.8 Bayesian Networks....321
8.2.9 Gaussian Process....322
8.2.10 k-Nearest Neighbors (k-NN)....322
8.2.11 Deep Learning....324
8.3 Unsupervised Learning....326
8.3.1 Clustering....326
8.3.2 Dimensionality Reduction....329
8.3.3 Association Rule Learning....330
8.3.4 Anomaly Detection....330
8.3.5 Generative Models....331
8.3.6 Topic Modeling....331
8.3.7 Self-Supervised Learning....332
8.4 Semi-supervised Learning....333
8.5 Reinforcement Learning....334
8.6 Data Preprocessing for Machine Learning Models....335
8.6.1 Data Cleaning....335
8.6.2 Data Normalization....336
8.6.3 Feature Engineering....337
8.6.4 Data Augmentation....337
8.7 Technique to Optimize Model Parameters....337
8.7.1 Grid Search....338
8.7.2 Random Search....338
8.7.3 Bayesian Optimization....338
8.7.4 Gradient-Based Optimization....338
8.7.5 Evolutionary Algorithms....339
8.7.6 Optuna....339
8.8 Techniques for Evaluating a Machine Learning Model....339
8.8.1 Training and Testing Split....339
8.8.2 Cross-Validation....340
8.8.3 Confusion Matrix....340
8.8.4 Accuracy....340
8.8.5 Precision (Positive Predictive Value)....341
8.8.6 Sensitivity or Recall....341
8.8.7 F1 Score....341
8.8.8 Negative Predictive Value (NPV)....341
8.8.9 Balanced Accuracy....342
8.8.10 Matthews Correlation Coefficient (MCC)....342
8.8.11 ROC Curve and AUC....342
8.8.12 Machine Learning Model Building and Evaluation....342
8.9 Implementing Simple Machine Learning Models in Python....343
8.9.1 Key Python Libraries for Machine Learning in Biomedicine....343
8.10 Implementing Machine Learning Models in Python....344
8.10.1 Predicting Drug Sensitivity Using LogisticRegression....344
8.10.2 Predicting Drug Sensitivity Using RandomForest....346
8.10.3 Predicting Drug Sensitivity Using AlphaML....348
8.11 Exercises and Questions....349
References....351
9: Image Processing in Biomedical Research....358
9.1 Overview of Image Processing....359
9.1.1 Importance and Impact of Image Analysis....360
9.2 Fundamentals of Image Processing....362
9.2.1 Image Representation....362
9.2.2 Image Preprocessing....364
9.2.3 Image Enhancement Techniques....368
9.2.4 Image Segmentation....370
9.2.5 Feature Extraction and Pattern Recognition....374
9.2.6 Challenges in Biomedical Image Processing....376
9.3 Python Libraries for Image Analysis....378
9.3.1 OpenCV....378
9.3.2 scikit-image....380
9.3.3 SimpleITK in Biomedical Image Processing....380
9.3.4 Mahotas in Biomedical Image Processing....383
9.3.5 Napari in Biomedical Image Processing....385
9.3.6 Other Python Applications in Biomedical Image Processing....387
9.4 Exercises and Questions....389
References....390
10: Genomic Data Analysis....393
10.1 Introduction to Genomic Data Analysis....395
10.1.1 Genomic Data and Its Importance....395
10.1.2 Key Types of Genomic Data....396
10.2 Overview of Python Tools for Genomic Data Analysis....397
10.2.1 Why Python for Genomic Data Analysis?....397
10.2.2 Python Libraries for Genomic Data....398
10.3 Introduction to DNA Sequence Data....399
10.3.1 FASTA Format: Structure and Parsing DNA Sequences....399
10.3.2 Working with DNA Sequences....400
10.3.3 Sequence Alignment....402
10.3.4 DNA Motif and Pattern Searching....403
10.3.5 Identifying Genes in Genomic Sequences....404
10.4 Variant Calling and SNP Analysis....405
10.4.1 Variants and Single Nucleotide Polymorphisms (SNPs)....405
10.4.2 Variant Calling Workflow....406
10.4.3 Working with VCF Files in Python....407
10.4.4 Annotation of Genomic Variants....407
10.4.5 Visualization of Genomic Variants....408
10.5 Epigenomics Data Analysis....409
10.5.1 Key Epigenetic Mechanisims....409
10.5.2 Common File Formats and Handling Epigenomic Data....409
10.5.3 Peak Calling in ChIP-seq Data....410
10.5.4 Analysis of Differential Methylation Patterns....411
10.5.5 Identifying Differentially Methylated Regions in Cancer....411
10.6 Population Genomics Analysis....411
10.6.1 Population Genomics Data....412
10.6.2 Working with Population Genomic Data....412
10.6.3 Population Structure and Phylogenetic Analysis....412
10.6.4 Computing Population Genetics Statistics....412
10.6.5 Analyzing Genetic Diversity in Human Populations....413
10.7 Visualization Techniques for Genomic Data....413
10.7.1 Heatmaps for Gene Expression Analysis....413
10.7.2 Genome Browser-Like Visualization....415
10.7.3 Plotting Genomic Variants and Association Studies....416
10.7.4 3D Visualization of Genomic Data....417
10.8 Integration with Bioinformatics Tools and Pipelines....418
10.8.1 Bioinformatics Pipelines....418
10.8.2 Workflow Management Tools....418
10.9 Exercises and Questions....420
References....421
11: Pharmacokinetics and Pharmacodynamics Analysis....423
11.1 Pharmacokinetics and Pharmacodynamics....425
11.1.1 Pharmacokinetics....425
11.1.2 Pharmacodynamics....427
11.2 Data Collection and Types of Data....429
11.2.1 Sources of PKPD Data....429
11.2.2 Types of Data....430
11.2.3 Time-Course Data in PKPD....431
11.3 Python Libraries for PKPD Analysis....432
11.3.1 NumPy and SciPy for Mathematical Modeling....432
11.4 Pharmacokinetic Modeling Using Python....432
11.4.1 Compartmental Models....433
11.4.2 Absorption, Distribution, Metabolism, and Excretion (ADME) Modeling....433
11.4.3 First-Order and Zero-Order Kinetics....434
11.4.4 Implementation of PK Models....436
11.5 Pharmacodynamic Modeling Using Python....440
11.5.1 Dose-Response Curves....440
11.5.2 Linear Versus Nonlinear Models in PD....440
11.6 Integrated PKPD Modeling Using Python....443
11.6.1 Linking PK and PD Models....443
11.6.2 Effect Compartment Models....444
11.6.3 PKPD Modeling of a Hypothetical Drug....445
11.7 Parameter Estimation in PKPD Models....448
11.7.1 Nonlinear Least Squares Regression for Parameter Estimation....448
11.7.2 Maximum Likelihood Estimation (MLE) in PKPD....450
11.7.3 Bootstrapping and Confidence Intervals in PKPD Models....453
11.8 Sensitivity Analysis and Model Validation in PKPD....456
11.8.1 Sensitivity Analysis of PKPD Parameters....456
11.8.2 Goodness-of-Fit Tests for Model Validation....458
11.9 Applications of PKPD Modeling in Drug Development....462
11.9.1 PKPD in Dose Optimization....462
11.9.2 Predicting Drug Efficacy and Toxicity....464
11.9.3 PKPD in Personalized Medicine....466
11.9.4 Using PKPD to Inform Clinical Trial Design....468
11.10 Exercises and Questions....470
References....472
12: Natural Language Processing (NLP) Basics....474
12.1 Introduction to NLP....476
12.1.1 NLP in Biomedical Research....477
12.1.2 Applications of NLP in Biomedical Research....477
12.2 Understanding Biomedical Text Data....478
12.2.1 Types of Biomedical Text Data....478
12.2.2 Challenges in Biomedical Text Processing....478
12.2.3 Common Data Formats in Biomedical Text Mining....480
12.3 Basic Concepts in NLP....481
12.3.1 Tokenization and Text Preprocessing....482
12.3.2 Stemming and Lemmatization in Biomedical Texts....483
12.3.3 Removing Stopwords and Punctuation....485
12.3.4 Handling Abbreviations and Acronyms....486
12.3.5 Sentence Splitting and Paragraph Segmentation....489
12.3.6 Normalization of Text....491
12.4 Python Libraries for NLP in Biomedicine....492
12.4.1 Natural Language Toolkit (NLTK)....493
12.4.2 spaCy for Biomedical Text Processing....493
12.4.3 scikit-Learn for Text Classification....493
12.4.4 Bio-Specific Libraries: scispaCy, Biopython, and PyMedTermino....494
12.5 NER in Biomedical Texts....495
12.5.1 Biomedical NER: Identifying Genes, Proteins, Diseases, and Drugs....495
12.5.2 Using scispaCy for Biomedical NER....496
12.6 Text Classification and Categorization....497
12.6.1 Classifying Biomedical Documents....497
12.6.2 Supervised Learning for Text Classification....497
12.7 Information Extraction and Relation Extraction....498
12.7.1 Extracting Key Entities and Relationships from Biomedical Texts....498
12.7.2 Rule-Based Versus Machine Learning Approaches for Relation Extraction....499
12.7.3 Extracting Drug-Drug Interactions from Literature....500
12.7.4 Benefits of Relation Extraction in Biomedical Research....501
12.8 Topic Modeling and Text Mining....502
12.8.1 Latent Dirichlet Allocation (LDA) for Topic Modeling in Biomedical Texts....502
12.8.2 Clustering Biomedical Research Papers by Topic....503
12.9 Sentiment Analysis in Biomedical Research....505
12.9.1 Sentiment Analysis in Biomedical Texts....505
12.9.2 Challenges of Sentiment Analysis in Scientific Texts....506
12.9.3 Analyzing Sentiments in Patient Reviews of Drugs....507
12.10 NLP for Literature-Based Discovery (LBD)....510
12.10.1 Overview of LBD....510
12.10.2 Tools and Approaches for LBD in Biomedical Texts....511
12.11 Ethical and Regulatory Considerations in Biomedical NLP....512
12.12 Exercises and Questions....513
References....514
13: Single-Cell RNA Sequencing Data Analysis....517
13.1 Basics of scRNA-seq....519
13.1.1 The scRNA-seq Technology....519
13.1.2 Importance and Applications in Biomedical Research....520
13.1.3 Differences Between Bulk and scRNA-seq....521
13.2 Experimental Design and Data Acquisition....522
13.2.1 Experimental Workflow for scRNA-seq....522
13.2.2 Key Considerations in scRNA-seq Experimentation....523
13.3 Processing Raw scRNA-seq Data....524
13.3.1 FASTA and FASTQ Files: Structure and Content....524
13.3.2 Quality Control and Preprocessing of Raw Data....524
13.3.3 Mapping Reads to a Reference Genome....525
13.3.4 Python Tools for Preprocessing and Alignment....525
13.4 Quality Control and Filtering of scRNA-seq Data....526
13.4.1 Identifying and Removing Low-Quality Cells....526
13.4.2 Detection of Doublets and Empty Drops....527
13.4.3 Normalization and Scaling of Gene Expression Data....529
13.5 Dimensionality Reduction Techniques....531
13.5.1 PCA for scRNA-seq Data....531
13.5.2 t-SNE for scRNA-seq Data....532
13.5.3 UMAP for scRNA-seq Data....532
13.5.4 Visualizing Single-Cell Clusters....533
13.6 Clustering and Cell Type Identification....533
13.6.1 Clustering Methods....534
13.6.2 Cell Type Annotation and Marker Gene Detection....535
13.6.3 Python Tools for Clustering....536
13.7 Differential Gene Expression Analysis....537
13.7.1 Differential Expression in scRNA-seq....537
13.7.2 Identifying Differentially Expressed Genes Across Cell Populations....538
13.7.3 Statistical Approaches for DEG in scRNA-seq Analysis....538
13.7.4 Python Tools for Differential Expression....539
13.8 Trajectory Inference and Pseudotime Analysis....540
13.8.1 Developmental Trajectories....540
13.8.2 Pseudotime Estimation in Single-Cell Data....540
13.8.3 Python Tools for Trajectory Analysis....541
13.9 Integration of scRNA-seq Datasets....542
13.9.1 Challenges in Integrating Data Across Conditions and Studies....542
13.9.2 Batch Effect Correction Techniques....543
13.9.3 Integrating Data from Different Sequencing Technologies....543
13.9.4 Example Code for Integration....544
13.10 Advanced Analysis and Interpretation of scRNA-seq Data....545
13.10.1 Gene Regulatory Network Inference....545
13.10.2 Functional Enrichment Analysis (GO, KEGG)....545
13.10.3 Pathway Analysis and Cell-Cell Communication....546
13.11 Exercises and Questions....546
References....548
Appendices....552
Additional Resources and Reading....552
Advanced Python Textbooks....552
Specialized Bioinformatics Books....552
Online Documentation and Tutorials....552
Research Papers and Case Studies....552
Blogs and Websites....552
Online Course Platforms with Specialized Tracks....552
YouTube Channels and Podcasts....553
Professional and Academic Journals....553
Python in Emerging Technologies....553
Networking and Professional Development....553
Advanced Python Libraries for Biomedical Research....553
Glossary....555
This introductory book is a beginner-friendly resource that empowers you to harness Python programming for exploring and understanding biomedical data. In today’s data-driven world, the ability to analyze and interpret complex datasets is a vital skill—especially in biomedicine, where data-driven insights can lead to groundbreaking advancements in health and medicine. Starting from scratch, this book introduces Python's fundamental syntax and guides you through its powerful applications in real-world biomedical research.
Starting with the basics, this book offers a gentle introduction to Python's syntax and core concepts, making it accessible even if it is your first encounter with coding. You will discover that Python is more than just a tool—it becomes an essential partner in uncovering the stories within your data. Our primary aim is to equip you with a foundational understanding of Python, enabling you to run pre-written programs effectively and create simple pipelines for executing sequences of applications. You will engage with practical examples and exercises inspired by real-world biomedical scenarios, giving you realistic insights into the challenges and successes you may encounter in your data analysis tasks.
Whether you are taking your first steps into data analysis or looking to expand your current skills, this introductory guide is ideal for graduate students, emerging researchers, and professionals in the biomedical field who are new to programming or Python. Python Essentials for Biomedical Data Analysis serves as a valuable and inspiring resource throughout your journey, unlocking the expansive potential of Python in biomedical research.