Kafka in Action....1
brief contents....8
contents....10
foreword....16
preface....17
acknowledgments....19
about this book....21
Who should read this book?....21
How this book is organized: A roadmap....21
About the code....23
liveBook discussion forum....23
Other online resources....23
about the authors....24
about the cover illustration....25
Part 1 Getting started....26
1 Introduction to Kafka....28
1.1 What is Kafka?....29
1.2 Kafka usage....33
1.2.1 Kafka for the developer....33
1.2.2 Explaining Kafka to your manager....34
1.3 Kafka myths....35
1.3.1 Kafka only works with Hadoop....35
1.3.2 Kafka is the same as other message brokers....36
1.4 Kafka in the real world....36
1.4.1 Early examples....37
1.4.2 Later examples....38
1.4.3 When Kafka might not be the right fit....39
1.5 Online resources to get started....40
References....40
2 Getting to know Kafka....42
2.1 Producing and consuming a message....43
2.2 What are brokers?....43
2.3 Tour of Kafka....48
2.3.1 Producers and consumers....48
2.3.2 Topics overview....51
2.3.3 ZooKeeper usage....52
2.3.4 Kafkas high-level architecture....53
2.3.5 The commit log....54
2.4 Various source code packages and what they do....55
2.4.1 Kafka Streams....55
2.4.2 Kafka Connect....56
2.4.3 AdminClient package....57
2.4.4 ksqlDB....57
2.5 Confluent clients....58
2.6 Stream processing and terminology....61
2.6.1 Stream processing....62
2.6.2 What exactly-once means....63
References....64
Part 2 Applying Kafka....66
3 Designing a Kafka project....68
3.1 Designing a Kafka project....69
3.1.1 Taking over an existing data architecture....69
3.1.2 A first change....69
3.1.3 Built-in features....69
3.1.4 Data for our invoices....72
3.2 Sensor event design....74
3.2.1 Existing issues....74
3.2.2 Why Kafka is the right fit....76
3.2.3 Thought starters on our design....77
3.2.4 User data requirements....78
3.2.5 High-level plan for applying our questions....79
3.2.6 Reviewing our blueprint....82
3.3 Format of your data....82
3.3.1 Plan for data....83
3.3.2 Dependency setup....84
References....89
4 Producers: Sourcing data....91
4.1 An example....92
4.1.1 Producer notes....95
4.2 Producer options....95
4.2.1 Configuring the broker list....96
4.2.2 How to go fast (or go safer)....97
4.2.3 Timestamps....99
4.3 Generating code for our requirements....101
4.3.1 Client and broker versions....109
References....110
5 Consumers: Unlocking data....112
5.1 An example....113
5.1.1 Consumer options....114
5.1.2 Understanding our coordinates....117
5.2 How consumers interact....121
5.3 Tracking....121
5.3.1 Group coordinator....123
5.3.2 Partition assignment strategy....125
5.4 Marking our place....126
5.5 Reading from a compacted topic....128
5.6 Retrieving code for our factory requirements....128
5.6.1 Reading options....128
5.6.2 Requirements....130
References....133
6 Brokers....136
6.1 Introducing the broker....136
6.2 Role of ZooKeeper....137
6.3 Options at the broker level....138
6.3.1 Kafkas other logs: Application logs....140
6.3.2 Server log....140
6.3.3 Managing state....141
6.4 Partition replica leaders and their role....142
6.4.1 Losing data....144
6.5 Peeking into Kafka....145
6.5.1 Cluster maintenance....146
6.5.2 Adding a broker....147
6.5.3 Upgrading your cluster....147
6.5.4 Upgrading your clients....147
6.5.5 Backups....148
6.6 A note on stateful systems....148
6.7 Exercise....150
References....151
7 Topics and partitions....154
7.1 Topics....154
7.1.1 Topic-creation options....157
7.1.2 Replication factors....159
7.2 Partitions....159
7.2.1 Partition location....160
7.2.2 Viewing our logs....161
7.3 Testing with EmbeddedKafkaCluster....162
7.3.1 Using Kafka Testcontainers....163
7.4 Topic compaction....164
References....167
8 Kafka storage....169
8.1 How long to store data....170
8.2 Data movement....171
8.2.1 Keeping the original event....171
8.2.2 Moving away from a batch mindset....171
8.3 Tools....172
8.3.1 Apache Flume....172
8.3.2 Red Hat Debezium....174
8.3.3 Secor....174
8.3.4 Example use case for data storage....175
8.4 Bringing data back into Kafka....176
8.4.1 Tiered storage....177
8.5 Architectures with Kafka....177
8.5.1 Lambda architecture....178
8.5.2 Kappa architecture....179
8.6 Multiple cluster setups....180
8.6.1 Scaling by adding clusters....180
8.7 Cloud- and container-based storage options....180
8.7.1 Kubernetes clusters....181
References....181
9 Management: Tools and logging....183
9.1 Administration clients....184
9.1.1 Administration in code with AdminClient....184
9.1.2 kcat....186
9.1.3 Confluent REST Proxy API....187
9.2 Running Kafka as a systemd service....188
9.3 Logging....189
9.3.1 Kafka application logs....189
9.3.2 ZooKeeper logs....191
9.4 Firewalls....191
9.4.1 Advertised listeners....192
9.5 Metrics....192
9.5.1 JMX console....192
9.6 Tracing option....195
9.6.1 Producer logic....196
9.6.2 Consumer logic....197
9.6.3 Overriding clients....198
9.7 General monitoring tools....199
References....201
Part 3 Going further....204
10 Protecting Kafka....206
10.1 Security basics....208
10.1.1 Encryption with SSL....208
10.1.2 SSL between brokers and clients....209
10.1.3 SSL between brokers....212
10.2 Kerberos and the Simple Authentication and Security Layer (SASL)....212
10.3 Authorization in Kafka....214
10.3.1 Access control lists (ACLs)....214
10.3.2 Role-based access control (RBAC)....215
10.4 ZooKeeper....216
10.4.1 Kerberos setup....216
10.5 Quotas....216
10.5.1 Network bandwidth quota....217
10.5.2 Request rate quotas....218
10.6 Data at rest....219
10.6.1 Managed options....219
References....220
11 Schema registry....222
11.1 A proposed Kafka maturity model....223
11.1.1 Level 0....223
11.1.2 Level 1....224
11.1.3 Level 2....224
11.1.4 Level 3....225
11.2 The Schema Registry....225
11.2.1 Installing the Confluent Schema Registry....226
11.2.2 Registry configuration....226
11.3 Schema features....227
11.3.1 REST API....227
11.3.2 Client library....228
11.4 Compatibility rules....230
11.4.1 Validating schema modifications....230
11.5 Alternative to a schema registry....232
References....233
12 Stream processing with Kafka Streams and ksqlDB....234
12.1 Kafka Streams....235
12.1.1 KStreams API DSL....236
12.1.2 KTable API....240
12.1.3 GlobalKTable API....241
12.1.4 Processor API....241
12.1.5 Kafka Streams setup....243
12.2 ksqlDB: An event-streaming database....244
12.2.1 Queries....245
12.2.2 Local development....245
12.2.3 ksqlDB architecture....247
12.3 Going further....248
12.3.1 Kafka Improvement Proposals (KIPs)....248
12.3.2 Kafka projects you can explore....248
12.3.3 Community Slack channel....249
References....249
appendix A Installation....252
A.1 Operating system (OS) requirements....252
A.2 Kafka versions....252
A.3 Installing Kafka on your local machine....252
A.3.1 Prerequisite: Java....253
A.3.2 Prerequisite: ZooKeeper....253
A.3.3 Prerequisite: Kafka download....253
A.3.4 Starting a ZooKeeper server....254
A.3.5 Creating and configuring a cluster by hand....254
A.4 Confluent Platform....256
A.4.1 Confluent command line interface (CLI)....256
A.4.2 Docker....256
A.5 How to work with the book examples....257
A.5.1 Building from the command line....257
A.6 Troubleshooting....258
appendix B Client example....259
B.1 Python Kafka clients....259
B.1.1 Installing Python....259
B.1.2 Python producer example....259
B.1.3 Python consumer....260
B.2 Client testing....261
B.2.1 Unit testing in Java....261
B.2.2 Kafka Testcontainers....262
References....262
index....264
A....264
B....264
C....264
D....265
E....266
F....266
G....266
H....266
I....266
J....266
K....266
L....267
M....267
N....267
O....267
P....267
Q....268
R....268
S....268
T....268
U....269
V....269
W....269
Z....269
Kafka in Action - back....272
Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you’ll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics.
Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications.
Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you’ll explore the most common use cases such as logging and managing streaming data. When you’re done, you’ll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team.
For intermediate Java developers or data engineers. No prior knowledge of Kafka required.