Contents....6
About the Author....13
About the Technical Reviewers....14
Acknowledgements....16
Introduction....17
1 Introduction....18
1.1 Modern Computers and Their History....18
1.1.1 Today, Most Personal Computers Are Used Primarily As Communication Devices....18
1.1.2 A Brief History of the Computer Age....19
1.2 Our Modern Idea of Computers: A Theory of Computation....28
2 Digital Computation....32
2.1 Fundamentals of Computation I: Transistors, Logic Gates, and Moore's Law....32
2.1.1 Transistors and Moore's Law....32
2.1.2 The Transistor Is the Fundamental Physical Unit of Computation....33
2.1.3 Transistors Are Organized into Logic Gates....34
2.1.4 Moore's Law....35
2.1.5 The Transistor Budget....35
2.2 Fundamentals of Computation II: Bits, Boolean Logic, and the Digital Age of George Boole & Claude Shannon....37
2.2.1 George Boole....37
2.2.2 Claude Shannon....38
2.3 Fundamentals of Computation III: Data, Instructions, & Pointers....39
2.4 Code and Data: Cycles of Fetch, Decode, and Execute, Over and Over....41
3 Operating Systems....44
3.1 Operating Systems and Linux....44
3.1.1 A Brief History of Operating Systems, with an Emphasis on UNIX....45
3.2 The UNIXLinux Filesystem....46
3.3 The Memory Manager and Process Scheduler....48
3.4 Working with Datafiles in Linux....49
3.5 An Introduction to the Process Scheduler....50
4 Introduction to Bash....51
4.1 The Bash Shell....51
4.2 Changing the Bash Prompt....52
4.3 Navigating Bash History and the LINUX Filesystem....53
4.4 Files in LINUX....56
4.4.1 Clobbering a File....58
4.4.2 Changing File Permissions....59
Using the chmod Command....59
With Base-8 Numbers (Octals)....60
4.4.3 Setting the Session noclobber Flag....61
4.5 Character Encoding and Text File Formats....61
4.6 The UNIX Philosophy: An Introduction....64
4.7 The Bash Interpreter....64
4.7.1 Variables....65
4.7.2 The Bash Environment....66
Writing Bash Scripts Using Bash Environment Variables....67
4.7.3 The Bash Interpreter....67
4.7.4 The Symbol Table and Variables....67
4.8 Some Bash Tools for Working with Datafiles....68
4.9 The General-Purpose Bash Commands time and watch....70
4.10 The Bash Pipeline....71
4.11 Some Advice for Writing Bash Scripts and Pipelines....73
5 Bash: Combining Commands and Variables to Make Pipelines and Scripts to Process Data....77
5.1 Introduction to vim and the Bash script: How to Make an Executable Script That Runs On the Command Line & Takes Arguments....77
5.1.1 Linting Our Executable Script....86
5.1.2 Review....88
5.2 tr Command: Translate or Delete Characters....88
5.2.1 A Bash Pipeline Spellchecker....88
Can We Do Better?....95
5.2.2 A DOS to Linux Files Converter....97
5.3 Extracting and Analyzing Data from Datafiles Using Bash Pipelines....98
5.4 gawk: How Many Named Stars Are There?....104
5.5 Getting Resultsets from Bash Queries....108
5.5.1 Advanced Subject: Format Printing....110
5.6 How to Make an Executable Script That Prompts Users for Input....110
5.7 Datetime in Bash....112
5.8 Introduction to Regular Expressions Using grep -E....114
5.9 Words You Can Make on a Calculator....120
5.10 Regular Expressions, sed, & tr: Reformatting Records In a Dataset....121
5.11 Finding Approximate Matches with agrep....124
5.12 Write Once, Run Everywhere: Embedding Our Executable Scripts in Pipelines & Invoking Them in Other Scripts....125
6 Algorithms and Coding....130
6.1 An Introduction to Algorithms....130
6.1.1 Recipes as Algorithms....133
6.1.2 Definition of an Algorithm from The Art of Computer Programming....133
6.1.3 Control Flow....134
6.1.4 Euclid GCD Algorithm....138
6.1.5 The Little Hummer Card Trick....139
6.1.6 The Bubblesort Algorithm....140
6.1.7 Notes on the Mystical History of Algorithms....144
6.2 An Introduction to Programming Style....145
6.2.1 Richard Hamming on Programming Style, As Told by BrianKernighan....146
7 Floating-Point Numbers....148
7.1 Floating-Point Numbers....148
7.2 Working with Floats: Rules of Thumb....151
7.3 Arbitrary Precision Math with mpmath....152
7.4 Improving the Accuracy of Floating-Point Calculations with Herbie....153
8 Introduction to Python....156
8.1 Python Primer....156
8.1.1 Python Comes With Built-Ins: Built-In Functions, Built-In Data Types and Collections, and Built-In Modules....157
8.1.2 Python Comes with a Built-In Error-Reporting Module Called the Traceback....158
8.1.3 The IPython Interactive Shell I....159
8.1.4 In Python, Everything Is an Object....162
9 Using Python As a Calculator....165
9.1 Datetime I: Duration of the COVID-19 Pandemic....165
9.2 Datetime II: Solving Date Problems....167
9.3 Heat Loss....168
9.4 Find the GC Content Percentage of a Nucleotide String....169
9.5 Stoichiometry with SymPy....171
9.6 Ideal Gas Law....175
9.7 Work Performed by Expanding Gas....176
9.8 Acceleration of Sun on Earth....177
9.9 Sound Level....178
9.10 SymPy on Jupyter Notebooks....179
9.10.1 The Jupyter Notebook....180
9.11 Falling Bodies....182
9.12 Spherical Trigonometry in Navigation....185
9.13 Climate Data....187
9.14 Digital Signals....193
9.15 Image-Driven Data Analysis: Flood Mitigation....196
10 Programming....202
10.1 Our First Program: Of Functions, Modules, Garbage Filters,Tests, & Docstrings....203
10.1.1 The Least Necessary to Write a Working Function....204
10.1.2 The Least Necessary to Write a Useful Function....205
10.1.3 Documenting Our Function for Coders and Users....206
10.1.4 Saving Our Function to a Module....208
10.1.5 Autoreloading Edited Module Content to an IPython Session....209
10.1.6 Writing Garbage Filters: Handling Bad Input....211
10.1.7 Automating Testing Our Code Using Unit Tests....213
10.1.8 Linting Our Code with Pylint....216
10.2 Introduction to Programming I: Implementing Algorithms....218
10.2.1 The IPython Interactive Shell II....221
10.2.2 Back to Python and Coding....222
10.3 Testing If Symbol in String Is Nucleotide....224
10.4 Euclid GCD....227
10.5 Converting Decimal Fractions into Binary....231
Garbage Filter: How to Assert That Only Positive Decimal Fractions Are Passed into Function....235
10.5.1 The More You Know: Brahmagupta (598–668ce)....238
10.6 Introduction to Programming II: Unit Testing....238
10.6.1 Unit Tests and Our unittest_template.py File....238
Last but Not Least....242
10.7 Finding the Reverse Complement of a Nucleotide String....243
10.7.1 A Brief Introduction to the Python Dictionary....244
10.7.2 Developing Code by Test-Driven Development (TDD)....247
A Few Words on Our Approach to Programming....250
10.8 Counting Symbols in a String....251
10.8.1 Error-Trapping....252
10.9 IPython Magics....253
10.10 Introduction to Programming V: Good Programming Practices....254
10.11 Programs Algorithms Data Structures....256
10.12 Interlude: Ilayda Develops Her Stoichiometry Code....257
11 Functions....260
11.1 Subroutines: The Genesis of Functions in Programming Languages....260
11.2 Functions....261
11.3 Modules....262
11.4 Documenting Your Functions, Making Them Robust with Error-Trapping and Exception Handling, and Proofing Them....263
11.5 Positional vs. Named Arguments....263
11.6 Multiple Return Values from a Python Function....266
11.7 Lambdas....268
11.8 Matters of Style When Writing Functions....269
11.9 A Calculus Primer: Numeric Integration & Differentiation Using Python....269
11.9.1 Numerical Integration....270
11.9.2 The More You Know....271
11.9.3 Back to Numerical Integration....272
11.9.4 Using Python and Numpy for Numerical Integration....275
11.9.5 Numerical Differentiation....277
11.9.6 Numerical Calculus with SymPy....279
12 Software Design....282
12.1 Writing Programs: Top-Down Design Methodology....282
12.2 Writing Programs: Converting a Top-Down Design Into A Program ofSubroutines....284
12.3 Writing Your Code Base As a Set of Files....286
12.4 Writing Programs: A Practical Perspective....287
13 Working with Datasets....290
13.1 Accessing the Tabular Contents of Datafiles in Python Using a Listof Lists (LoL)....290
13.2 Nested Collections....293
13.3 Finding the Minimum and Maximum Values in an Unsorted Collection....294
13.4 Parsers....295
13.4.1 Revisiting fasta Parsers....295
13.5 Too Big To Handle: Pre-Processing Large Datasets, Extracting Only Needed Dimensions....300
13.6 One Record per Text File....301
14 Programming Efficiency....306
14.1 The Analysis of Algorithms....306
14.2 O(n): Finding a Value in an Unsorted Collection of IndexValue Pairs....308
14.3 O(nm): Nested Loops and Their Time Complexity....310
14.3.1 Illustrating Executing Nested Loops with Nested Dolls....310
14.3.2 The Output from Running Our Example Nested Loop....316
14.3.3 Can We Do Better? Algebra to the Rescue!....316
14.4 O(ln2 n): Binary Trees....317
14.5 Functional Equivalence and Profiling Code....320
14.5.1 Dictionary As Lookup Table vs. Conditional Testing....321
14.6 Finding Min and Max Values in an Unsorted Collection II....322
14.7 Multiple Passes Through Dataset vs. Single Pass....323
14.7.1 An Outline of the Problem and a Solution....323
14.7.2 Extending the Solution....327
15 Other Subjects....329
15.1 An Introduction to Graph Theory....329
15.1.1 Saving Our Python Collections by Pickling Them....334
15.2 Writing Python Scripts That Write Scripts....335
15.3 Interactive Scripts That Prompt Users for Input....339
15.4 The Python Half-Open Interval, Range Objects, and Slicing....340
15.4.1 The Python Half-Open Interval....340
15.4.2 The Range Object Revisited....341
15.5 Finding Intervals with Overlap....341
15.6 Finding Interval Overlap in Genomic Sequences....344
15.7 Slicing Lists and Strings....350
15.8 From Nucleotide String to Amino Acid Strings....351
15.9 Comprehension....354
15.9.1 Slicing LoL, Extracting Columns with Comprehension....355
15.10 The Sieve of Eratosthenes....356
15.11 Transposing a Matrix....357
15.11.1 Transposing Tabular Datasets in Python....359
15.12 Stacks and Queues....360
15.12.1 Stacks....360
15.12.2 Queues....361
15.12.3 Algorithm: The Josephus Problem....362
15.13 Recursion....365
15.13.1 A Few Recursive Functions....368
16 SciPy....371
16.1 matplotlib: Graphics with SciPy....371
16.2 NetworkX: Working with Graphs....378
16.3 NumPy: Foundational Library of SciPy....379
16.3.1 The ndarray....380
16.3.2 Linear Algebra Has Three Objects: Scalar, Vector, and Matrix....381
16.3.3 Single-Instruction Multiple Data Registers in CPUs....382
16.3.4 Universal Functions (ufuncs) and Vectorized Operations....382
16.3.5 Row and Column Vectors in Memory and Data Processing....383
16.4 Linear Algebra....384
16.4.1 Datasets As Matrices I....384
16.4.2 Making a Rotatable 3D Graph from Data in a Dataset....388
When I Heard the Learn'd Astronomer....390
16.4.3 Datasets As Matrices II: Partitioning a Matrix....391
16.5 Pandas: Working with Labeled Datasets in Pandas....394
16.6 Pandas: Using Masks to Query Recordsets....395
16.7 Pandas: Getting Statistics on Datasets....397
16.7.1 Using Seaborn....400
16.8 Pandas: Processing Datasets Programmatically....401
16.9 SymPy: Symbolic Python....403
17 Odds and Ends....406
17.1 Writing Programs: Writing, Rewriting, and Matters of Style....406
17.2 Python Sets....407
17.3 Datetime in Datasets....408
17.3.1 Filling In Missing Datetime Entries....408
17.4 Introduction to Parallel Programming....413
18 Writing a Large Project....417
18.1 Putting It All Together: Solving Triangles....417
18.2 Design Considerations....419
18.3 Input and Output....420
18.4 Organization of the Code....420
18.5 Test-Driven Development (TDD) and Unit Testing....421
18.6 Linting Our Code....421
18.7 Pencil to Paper: Our Top-Down Design....421
Now to Start Writing Code....422
18.8 Our First Unit Test....423
18.9 Our First Draft of solve_triangle.py....424
18.10 Running Our First Unit Test....425
18.11 Running solve_triangle Function the First Time....426
18.12 A Note on Structured Design....427
18.13 Adding Design Comments....427
18.14 Writing Our First Draft....430
18.15 Testing Our Code....432
18.16 The Garbage Filter: Writing Unit Tests and Coding....433
18.17 Finishing solve_triangle(), v1....436
18.18 Improving solve_triangle(), v1....436
18.19 TI-59 Calculator: Triangle Solution, Master Library ROM ModuleA Different Approach....437
18.20 Losing the rnd Bool from the Argument List....438
18.21 Rethinking the Main Section of solve_triangle()....438
18.22 solve_triangle(), v2....441
18.23 What's Left to Do?....443
Suggested Reading....446
Chapter 1....446
Chapter 2....447
Chapter 3....450
Chapter 6....450
Chapter 7....451
References....452
Chapter 1....452
Chapter 2....452
Chapter 3....453
Chapter 4....453
Chapter 5....453
Chapter 6....454
Chapter 7....454
Chapter 9....454
Chapter 10....455
Chapter 11....455
Chapter 12....455
Chapter 14....455
Chapter 15....455
Chapter 16....456
Chapter 18....456
Index....457
Enhance your computational and programming skills using Bash and Python to improve productivity and efficiency in research projects. This book is an essential guide for STEM researchers. Structured into several parts, each builds on the previous ones to ensure a solid foundation in programming.
You’ll begin with the basics of digital computation and operating systems, then write pipelines and scripts in Bash, focusing on tools for working with datasets in text files. After introducing algorithms and floating-point numbers, the book transitions to Python, emphasizing SciPy libraries and built-in features like type hints and f-strings. IPython and Jupyter notebooks are integrated into the lessons throughout. Programming best practices are taught, alongside programming basics. These include documentation and unit testing. As the target audience is STEM students and professionals, examples make heavy use of datasets and the SciPy software stack, especially NumPy, Matplotlib, Pandas, and SymPy.
Introduction to Programming for Researchers will foster a deeper understanding of computational tools and critical programming skills, empowering you to tackle complex datasets and enhance their research capabilities.
Experienced researchers looking to improve their computational skills; students in the natural sciences and engineering; scientists and engineers from various fields, seeking to integrate programming skills into their research methodologies.