Introduction to Programming for Researchers: Learning Programming Fundamentals Through Dataset Processing in Bash and Python

Introduction to Programming for Researchers: Learning Programming Fundamentals Through Dataset Processing in Bash and Python

Introduction to Programming for Researchers: Learning Programming Fundamentals Through Dataset Processing in Bash and Python
Автор: Derry James R.
Дата выхода: 2026
Издательство: Apress Media, LLC.
Количество страниц: 464
Размер файла: 11,8 МБ
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

Contents....6

About the Author....13

About the Technical Reviewers....14

Acknowledgements....16

Introduction....17

1 Introduction....18

1.1 Modern Computers and Their History....18

1.1.1 Today, Most Personal Computers Are Used Primarily As Communication Devices....18

1.1.2 A Brief History of the Computer Age....19

1.2 Our Modern Idea of Computers: A Theory of Computation....28

2 Digital Computation....32

2.1 Fundamentals of Computation I: Transistors, Logic Gates, and Moore's Law....32

2.1.1 Transistors and Moore's Law....32

2.1.2 The Transistor Is the Fundamental Physical Unit of Computation....33

2.1.3 Transistors Are Organized into Logic Gates....34

2.1.4 Moore's Law....35

2.1.5 The Transistor Budget....35

2.2 Fundamentals of Computation II: Bits, Boolean Logic, and the Digital Age of George Boole & Claude Shannon....37

2.2.1 George Boole....37

2.2.2 Claude Shannon....38

2.3 Fundamentals of Computation III: Data, Instructions, & Pointers....39

2.4 Code and Data: Cycles of Fetch, Decode, and Execute, Over and Over....41

3 Operating Systems....44

3.1 Operating Systems and Linux....44

3.1.1 A Brief History of Operating Systems, with an Emphasis on UNIX....45

3.2 The UNIXLinux Filesystem....46

3.3 The Memory Manager and Process Scheduler....48

3.4 Working with Datafiles in Linux....49

3.5 An Introduction to the Process Scheduler....50

4 Introduction to Bash....51

4.1 The Bash Shell....51

4.2 Changing the Bash Prompt....52

4.3 Navigating Bash History and the LINUX Filesystem....53

4.4 Files in LINUX....56

4.4.1 Clobbering a File....58

4.4.2 Changing File Permissions....59

Using the chmod Command....59

With Base-8 Numbers (Octals)....60

4.4.3 Setting the Session noclobber Flag....61

4.5 Character Encoding and Text File Formats....61

4.6 The UNIX Philosophy: An Introduction....64

4.7 The Bash Interpreter....64

4.7.1 Variables....65

4.7.2 The Bash Environment....66

Writing Bash Scripts Using Bash Environment Variables....67

4.7.3 The Bash Interpreter....67

4.7.4 The Symbol Table and Variables....67

4.8 Some Bash Tools for Working with Datafiles....68

4.9 The General-Purpose Bash Commands time and watch....70

4.10 The Bash Pipeline....71

4.11 Some Advice for Writing Bash Scripts and Pipelines....73

5 Bash: Combining Commands and Variables to Make Pipelines and Scripts to Process Data....77

5.1 Introduction to vim and the Bash script: How to Make an Executable Script That Runs On the Command Line & Takes Arguments....77

5.1.1 Linting Our Executable Script....86

5.1.2 Review....88

5.2 tr Command: Translate or Delete Characters....88

5.2.1 A Bash Pipeline Spellchecker....88

Can We Do Better?....95

5.2.2 A DOS to Linux Files Converter....97

5.3 Extracting and Analyzing Data from Datafiles Using Bash Pipelines....98

5.4 gawk: How Many Named Stars Are There?....104

5.5 Getting Resultsets from Bash Queries....108

5.5.1 Advanced Subject: Format Printing....110

5.6 How to Make an Executable Script That Prompts Users for Input....110

5.7 Datetime in Bash....112

5.8 Introduction to Regular Expressions Using grep -E....114

5.9 Words You Can Make on a Calculator....120

5.10 Regular Expressions, sed, & tr: Reformatting Records In a Dataset....121

5.11 Finding Approximate Matches with agrep....124

5.12 Write Once, Run Everywhere: Embedding Our Executable Scripts in Pipelines & Invoking Them in Other Scripts....125

6 Algorithms and Coding....130

6.1 An Introduction to Algorithms....130

6.1.1 Recipes as Algorithms....133

6.1.2 Definition of an Algorithm from The Art of Computer Programming....133

6.1.3 Control Flow....134

6.1.4 Euclid GCD Algorithm....138

6.1.5 The Little Hummer Card Trick....139

6.1.6 The Bubblesort Algorithm....140

6.1.7 Notes on the Mystical History of Algorithms....144

6.2 An Introduction to Programming Style....145

6.2.1 Richard Hamming on Programming Style, As Told by BrianKernighan....146

7 Floating-Point Numbers....148

7.1 Floating-Point Numbers....148

7.2 Working with Floats: Rules of Thumb....151

7.3 Arbitrary Precision Math with mpmath....152

7.4 Improving the Accuracy of Floating-Point Calculations with Herbie....153

8 Introduction to Python....156

8.1 Python Primer....156

8.1.1 Python Comes With Built-Ins: Built-In Functions, Built-In Data Types and Collections, and Built-In Modules....157

8.1.2 Python Comes with a Built-In Error-Reporting Module Called the Traceback....158

8.1.3 The IPython Interactive Shell I....159

8.1.4 In Python, Everything Is an Object....162

9 Using Python As a Calculator....165

9.1 Datetime I: Duration of the COVID-19 Pandemic....165

9.2 Datetime II: Solving Date Problems....167

9.3 Heat Loss....168

9.4 Find the GC Content Percentage of a Nucleotide String....169

9.5 Stoichiometry with SymPy....171

9.6 Ideal Gas Law....175

9.7 Work Performed by Expanding Gas....176

9.8 Acceleration of Sun on Earth....177

9.9 Sound Level....178

9.10 SymPy on Jupyter Notebooks....179

9.10.1 The Jupyter Notebook....180

9.11 Falling Bodies....182

9.12 Spherical Trigonometry in Navigation....185

9.13 Climate Data....187

9.14 Digital Signals....193

9.15 Image-Driven Data Analysis: Flood Mitigation....196

10 Programming....202

10.1 Our First Program: Of Functions, Modules, Garbage Filters,Tests, & Docstrings....203

10.1.1 The Least Necessary to Write a Working Function....204

10.1.2 The Least Necessary to Write a Useful Function....205

10.1.3 Documenting Our Function for Coders and Users....206

10.1.4 Saving Our Function to a Module....208

10.1.5 Autoreloading Edited Module Content to an IPython Session....209

10.1.6 Writing Garbage Filters: Handling Bad Input....211

10.1.7 Automating Testing Our Code Using Unit Tests....213

10.1.8 Linting Our Code with Pylint....216

10.2 Introduction to Programming I: Implementing Algorithms....218

10.2.1 The IPython Interactive Shell II....221

10.2.2 Back to Python and Coding....222

10.3 Testing If Symbol in String Is Nucleotide....224

10.4 Euclid GCD....227

10.5 Converting Decimal Fractions into Binary....231

Garbage Filter: How to Assert That Only Positive Decimal Fractions Are Passed into Function....235

10.5.1 The More You Know: Brahmagupta (598–668ce)....238

10.6 Introduction to Programming II: Unit Testing....238

10.6.1 Unit Tests and Our unittest_template.py File....238

Last but Not Least....242

10.7 Finding the Reverse Complement of a Nucleotide String....243

10.7.1 A Brief Introduction to the Python Dictionary....244

10.7.2 Developing Code by Test-Driven Development (TDD)....247

A Few Words on Our Approach to Programming....250

10.8 Counting Symbols in a String....251

10.8.1 Error-Trapping....252

10.9 IPython Magics....253

10.10 Introduction to Programming V: Good Programming Practices....254

10.11 Programs Algorithms Data Structures....256

10.12 Interlude: Ilayda Develops Her Stoichiometry Code....257

11 Functions....260

11.1 Subroutines: The Genesis of Functions in Programming Languages....260

11.2 Functions....261

11.3 Modules....262

11.4 Documenting Your Functions, Making Them Robust with Error-Trapping and Exception Handling, and Proofing Them....263

11.5 Positional vs. Named Arguments....263

11.6 Multiple Return Values from a Python Function....266

11.7 Lambdas....268

11.8 Matters of Style When Writing Functions....269

11.9 A Calculus Primer: Numeric Integration & Differentiation Using Python....269

11.9.1 Numerical Integration....270

11.9.2 The More You Know....271

11.9.3 Back to Numerical Integration....272

11.9.4 Using Python and Numpy for Numerical Integration....275

11.9.5 Numerical Differentiation....277

11.9.6 Numerical Calculus with SymPy....279

12 Software Design....282

12.1 Writing Programs: Top-Down Design Methodology....282

12.2 Writing Programs: Converting a Top-Down Design Into A Program ofSubroutines....284

12.3 Writing Your Code Base As a Set of Files....286

12.4 Writing Programs: A Practical Perspective....287

13 Working with Datasets....290

13.1 Accessing the Tabular Contents of Datafiles in Python Using a Listof Lists (LoL)....290

13.2 Nested Collections....293

13.3 Finding the Minimum and Maximum Values in an Unsorted Collection....294

13.4 Parsers....295

13.4.1 Revisiting fasta Parsers....295

13.5 Too Big To Handle: Pre-Processing Large Datasets, Extracting Only Needed Dimensions....300

13.6 One Record per Text File....301

14 Programming Efficiency....306

14.1 The Analysis of Algorithms....306

14.2 O(n): Finding a Value in an Unsorted Collection of IndexValue Pairs....308

14.3 O(nm): Nested Loops and Their Time Complexity....310

14.3.1 Illustrating Executing Nested Loops with Nested Dolls....310

14.3.2 The Output from Running Our Example Nested Loop....316

14.3.3 Can We Do Better? Algebra to the Rescue!....316

14.4 O(ln2 n): Binary Trees....317

14.5 Functional Equivalence and Profiling Code....320

14.5.1 Dictionary As Lookup Table vs. Conditional Testing....321

14.6 Finding Min and Max Values in an Unsorted Collection II....322

14.7 Multiple Passes Through Dataset vs. Single Pass....323

14.7.1 An Outline of the Problem and a Solution....323

14.7.2 Extending the Solution....327

15 Other Subjects....329

15.1 An Introduction to Graph Theory....329

15.1.1 Saving Our Python Collections by Pickling Them....334

15.2 Writing Python Scripts That Write Scripts....335

15.3 Interactive Scripts That Prompt Users for Input....339

15.4 The Python Half-Open Interval, Range Objects, and Slicing....340

15.4.1 The Python Half-Open Interval....340

15.4.2 The Range Object Revisited....341

15.5 Finding Intervals with Overlap....341

15.6 Finding Interval Overlap in Genomic Sequences....344

15.7 Slicing Lists and Strings....350

15.8 From Nucleotide String to Amino Acid Strings....351

15.9 Comprehension....354

15.9.1 Slicing LoL, Extracting Columns with Comprehension....355

15.10 The Sieve of Eratosthenes....356

15.11 Transposing a Matrix....357

15.11.1 Transposing Tabular Datasets in Python....359

15.12 Stacks and Queues....360

15.12.1 Stacks....360

15.12.2 Queues....361

15.12.3 Algorithm: The Josephus Problem....362

15.13 Recursion....365

15.13.1 A Few Recursive Functions....368

16 SciPy....371

16.1 matplotlib: Graphics with SciPy....371

16.2 NetworkX: Working with Graphs....378

16.3 NumPy: Foundational Library of SciPy....379

16.3.1 The ndarray....380

16.3.2 Linear Algebra Has Three Objects: Scalar, Vector, and Matrix....381

16.3.3 Single-Instruction Multiple Data Registers in CPUs....382

16.3.4 Universal Functions (ufuncs) and Vectorized Operations....382

16.3.5 Row and Column Vectors in Memory and Data Processing....383

16.4 Linear Algebra....384

16.4.1 Datasets As Matrices I....384

16.4.2 Making a Rotatable 3D Graph from Data in a Dataset....388

When I Heard the Learn'd Astronomer....390

16.4.3 Datasets As Matrices II: Partitioning a Matrix....391

16.5 Pandas: Working with Labeled Datasets in Pandas....394

16.6 Pandas: Using Masks to Query Recordsets....395

16.7 Pandas: Getting Statistics on Datasets....397

16.7.1 Using Seaborn....400

16.8 Pandas: Processing Datasets Programmatically....401

16.9 SymPy: Symbolic Python....403

17 Odds and Ends....406

17.1 Writing Programs: Writing, Rewriting, and Matters of Style....406

17.2 Python Sets....407

17.3 Datetime in Datasets....408

17.3.1 Filling In Missing Datetime Entries....408

17.4 Introduction to Parallel Programming....413

18 Writing a Large Project....417

18.1 Putting It All Together: Solving Triangles....417

18.2 Design Considerations....419

18.3 Input and Output....420

18.4 Organization of the Code....420

18.5 Test-Driven Development (TDD) and Unit Testing....421

18.6 Linting Our Code....421

18.7 Pencil to Paper: Our Top-Down Design....421

Now to Start Writing Code....422

18.8 Our First Unit Test....423

18.9 Our First Draft of solve_triangle.py....424

18.10 Running Our First Unit Test....425

18.11 Running solve_triangle Function the First Time....426

18.12 A Note on Structured Design....427

18.13 Adding Design Comments....427

18.14 Writing Our First Draft....430

18.15 Testing Our Code....432

18.16 The Garbage Filter: Writing Unit Tests and Coding....433

18.17 Finishing solve_triangle(), v1....436

18.18 Improving solve_triangle(), v1....436

18.19 TI-59 Calculator: Triangle Solution, Master Library ROM ModuleA Different Approach....437

18.20 Losing the rnd Bool from the Argument List....438

18.21 Rethinking the Main Section of solve_triangle()....438

18.22 solve_triangle(), v2....441

18.23 What's Left to Do?....443

Suggested Reading....446

Chapter 1....446

Chapter 2....447

Chapter 3....450

Chapter 6....450

Chapter 7....451

References....452

Chapter 1....452

Chapter 2....452

Chapter 3....453

Chapter 4....453

Chapter 5....453

Chapter 6....454

Chapter 7....454

Chapter 9....454

Chapter 10....455

Chapter 11....455

Chapter 12....455

Chapter 14....455

Chapter 15....455

Chapter 16....456

Chapter 18....456

Index....457

Enhance your computational and programming skills using Bash and Python to improve productivity and efficiency in research projects. This book is an essential guide for STEM researchers. Structured into several parts, each builds on the previous ones to ensure a solid foundation in programming.

You’ll begin with the basics of digital computation and operating systems, then write pipelines and scripts in Bash, focusing on tools for working with datasets in text files. After introducing algorithms and floating-point numbers, the book transitions to Python, emphasizing SciPy libraries and built-in features like type hints and f-strings. IPython and Jupyter notebooks are integrated into the lessons throughout. Programming best practices are taught, alongside programming basics. These include documentation and unit testing. As the target audience is STEM students and professionals, examples make heavy use of datasets and the SciPy software stack, especially NumPy, Matplotlib, Pandas, and SymPy.

Introduction to Programming for Researchers will foster a deeper understanding of computational tools and critical programming skills, empowering you to tackle complex datasets and enhance their research capabilities.

What You Will Learn

  • Apply programming skills to enhance research productivity and efficiency.
  • Write Bash pipelines and executable scripts.
  • Implement basic algorithms in Python, focusing on time efficiency and structured programming.

Who This Book Is For

Experienced researchers looking to improve their computational skills; students in the natural sciences and engineering; scientists and engineers from various fields, seeking to integrate programming skills into their research methodologies.


Похожее:

Список отзывов:

Нет отзывов к книге.