inside front cover....1
Build an Orchestrator in Go (From Scratch)....3
Copyright....5
dedication....7
contents....8
Front matter....15
preface....15
acknowledgments....17
about this book....18
Who should read this book....18
How this book is organized: A road map....19
About the code....20
liveBook discussion forum....21
about the author....22
about the cover illustration....22
Part 1 Introduction....23
1 What is an orchestrator?....24
1.1 Why implement an orchestrator from scratch?....24
1.2 The (not so) good ol’ days....26
1.3 What is a container, and how is it different from a virtual machine?....28
1.4 What is an orchestrator?....31
1.5 The components of an orchestration system....32
1.5.1 The task....33
1.5.2 The job....34
1.5.3 The scheduler....35
1.5.4 The manager....36
1.5.5 The worker....36
1.5.6 The cluster....37
1.5.7 Command-line interface....37
1.6 Meet Cube....41
1.7 What tools will we use?....45
1.8 A word about hardware....47
1.9 What we won’t be implementing or discussing....50
1.9.1 Distributed computing....50
1.9.2 Service discovery....51
1.9.3 High availability....52
1.9.4 Load balancing....54
1.9.5 Security....55
Summary....56
2 From mental model to skeleton code....58
2.1 The task skeleton....60
2.2 The worker skeleton....65
2.3 The manager skeleton....67
2.4 The scheduler skeleton....70
2.5 Other skeletons....71
2.6 Taking our skeletons for a spin....73
Summary....76
3 Hanging some flesh on the task skeleton....78
3.1 Docker: Starting, stopping, and inspecting containers from the command line....80
3.2 Docker: Starting, stopping, and inspecting containers from the API....83
3.3 Task configuration....87
3.4 Starting and stopping tasks....89
Summary....101
Part 2 Worker....103
4 Workers of the Cube, unite!....104
4.1 The Cube worker....105
4.2 Tasks and Docker....109
4.3 The role of the queue....111
4.4 The role of the DB....112
4.5 Counting tasks....112
4.6 Implementing the worker’s methods....112
4.6.1 Implementing the StopTask method....113
4.6.2 Implementing the StartTask method....115
4.6.3 An interlude on task state....117
4.6.4 Implementing the RunTask method....122
4.7 Putting it all together....124
Summary....128
5 An API for the worker....129
5.1 Overview of the worker API....130
5.2 Data format, requests, and responses....133
5.3 The API struct....137
5.4 Handling requests....138
5.5 Serving the API....144
5.6 Putting it all together....145
Summary....153
6 Metrics....155
6.1 What metrics should we collect?....156
6.2 Metrics available from the /proc filesystem....158
6.3 Collecting metrics with goprocinfo....162
6.4 Exposing the metrics on the API....170
6.5 Putting it all together....172
Summary....175
Part 3 Manager....177
7 The manager enters the room....178
7.1 The Cube manager....179
7.1.1 The components that make up the manager....181
7.2 The Manager struct....183
7.3 Implementing the manager’s methods....184
7.3.1 Implementing the SelectWorker method....184
7.3.2 Implementing the SendWork method....187
7.3.3 Implementing the UpdateTasks method....189
7.3.4 Adding a task to the manager....192
7.3.5 Creating a manager....193
7.4 An interlude on failures and resiliency....194
7.5 Putting it all together....194
Summary....202
8 An API for the manager....203
8.1 Overview of the manager API....204
8.2 Routes....206
8.3 Data format, requests, and responses....207
8.4 The API struct....210
8.5 Handling requests....210
8.6 Serving the API....214
8.7 A few refactorings to make our lives easier....215
8.8 Putting it all together....218
Summary....225
9 What could possibly go wrong?....227
9.1 Overview of our new scenario....228
9.2 Failure scenarios....228
9.2.1 Application startup failure....229
9.2.2 Application bugs....230
9.2.3 Task startup failures due to resource problems....230
9.2.4 Task failures due to Docker daemon crashes and restarts....231
9.2.5 Task failures due to machine crashes and restarts....231
9.2.6 Worker failures....232
9.2.7 Manager failures....233
9.3 Recovery options....233
9.3.1 Recovery from application failures....234
9.3.2 Recovering from environmental failures....234
9.3.3 Recovering from task-level failures....235
9.3.4 Recovering from worker failures....236
9.3.5 Recovering from manager failures....238
9.4 Implementing health checks....239
9.4.1 Inspecting a task on the worker....240
9.4.2 Implementing task updates on the worker....243
9.4.3 Healthchecks and restarts....245
9.5 Putting it all together....251
Summary....258
Part 4 Refactorings....259
10 Implementing a more sophisticated scheduler....260
10.1 The scheduling problem....260
10.2 Scheduling considerations....262
10.3 Scheduler interface....262
10.4 Adapting the round-robin scheduler to the scheduler interface....266
10.5 Using the new scheduler interface....270
10.5.1 Adding new fields to the Manager struct....270
10.5.2 Modifying the New helper function....271
10.6 Did you notice the bug?....276
10.7 Putting it all together....278
10.8 The E-PVM scheduler....282
10.8.1 The theory....283
10.8.2 In practice....284
10.9 Completing the Node implementation....290
10.10 Using the E-PVM scheduler....294
Summary....296
11 Implementing persistent storage for tasks....297
11.1 The storage problem....298
11.2 The Store interface....298
11.3 Implementing an in-memory store for tasks....303
11.4 Implementing an in-memory store for task events....308
11.5 Refactoring the manager to use the new in-memory stores....309
11.6 Refactoring the worker....318
11.7 Putting it all together....324
11.8 Introducing BoltDB....326
11.9 Implementing a persistent task store....327
11.10 Implementing a persistent task event store....334
11.11 Switching out the in-memory stores for permanent ones....337
Summary....339
Part 5 CLI....341
12 Building a command-line interface....342
12.1 The core components of CLIs....344
12.2 Introducing the Cobra framework....345
12.3 Setting up our Cobra application....346
12.4 Understanding the new main.go....348
12.5 Understanding root.go....348
12.6 Implementing the worker command....351
12.7 Implementing the manager command....358
12.8 Implementing the run command....363
12.9 Implementing the stop command....370
12.10 Implementing the status command....373
12.11 Implementing the node command....376
Summary....381
13 Now what?....383
13.1 Working on Kubernetes and related tooling....384
13.2 Manager-worker pattern and workflow systems....385
13.3 Manager-worker pattern and integration systems....386
13.4 In closing....387
Appendix. Environment setup....388
A.1 Installing Go....388
A.1.1 Installing on Linux....388
A.2 Project structure and initialization....389
index....391
inside back cover....1
Orchestration systems like Kubernetes can seem like a black box: you deploy to the cloud and it magically handles everything you need. That might seem perfect—until something goes wrong and you don’t know how to find and fix your problems. Build an Orchestrator in Go (From Scratch) reveals the inner workings of orchestration frameworks by guiding you through creating your own.
Build an Orchestrator in Go (From Scratch) explains each stage of creating an orchestrator with diagrams, step-by-step instructions, and detailed Go code samples. Don’t worry if you’re not a Go expert. The book’s code is optimized for simplicity and readability, and its key concepts are easy to implement in any language. You’ll learn the foundational principles of these frameworks, and even how to manage your orchestrator with a command line interface.
Orchestration frameworks like Kubernetes and Nomad radically simplify managing containerized applications. Building an orchestrator from the ground up gives you deep insight into deploying and scaling containers, clusters, pods, and other components of modern distributed systems. This book guides you step by step as you create your own orchestrator—from scratch.
Build an Orchestrator in Go (From Scratch) gives you an inside-out perspective on orchestration frameworks and the low-level operation of distributed containerized applications. It takes you on a fascinating journey building a simple-but-useful orchestrator using the Docker API and Go SDK. As you go, you’ll get a guru-level understanding of Kubernetes, along with a pattern you can follow when you need to create your own custom orchestration solutions.
For software engineers, operations professionals, and SREs. This book’s simple Go code is accessible to all programmers.