Kafka in Action

Kafka in Action

Автор: Gamov Viktor , Klein Dave , Scott Dylan

Дата выхода: 2022

Издательство: Manning Publications Co.

Количество страниц: 272

Размер файла: 2,6 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы Дополнительные материалы

Kafka in Action....1

brief contents....8

contents....10

foreword....16

preface....17

acknowledgments....19

about this book....21

Who should read this book?....21

How this book is organized: A roadmap....21

About the code....23

liveBook discussion forum....23

Other online resources....23

about the authors....24

about the cover illustration....25

Part 1 Getting started....26

1 Introduction to Kafka....28

1.1 What is Kafka?....29

1.2 Kafka usage....33

1.2.1 Kafka for the developer....33

1.2.2 Explaining Kafka to your manager....34

1.3 Kafka myths....35

1.3.1 Kafka only works with Hadoop....35

1.3.2 Kafka is the same as other message brokers....36

1.4 Kafka in the real world....36

1.4.1 Early examples....37

1.4.2 Later examples....38

1.4.3 When Kafka might not be the right fit....39

1.5 Online resources to get started....40

References....40

2 Getting to know Kafka....42

2.1 Producing and consuming a message....43

2.2 What are brokers?....43

2.3 Tour of Kafka....48

2.3.1 Producers and consumers....48

2.3.2 Topics overview....51

2.3.3 ZooKeeper usage....52

2.3.4 Kafkas high-level architecture....53

2.3.5 The commit log....54

2.4 Various source code packages and what they do....55

2.4.1 Kafka Streams....55

2.4.2 Kafka Connect....56

2.4.3 AdminClient package....57

2.4.4 ksqlDB....57

2.5 Confluent clients....58

2.6 Stream processing and terminology....61

2.6.1 Stream processing....62

2.6.2 What exactly-once means....63

References....64

Part 2 Applying Kafka....66

3 Designing a Kafka project....68

3.1 Designing a Kafka project....69

3.1.1 Taking over an existing data architecture....69

3.1.2 A first change....69

3.1.3 Built-in features....69

3.1.4 Data for our invoices....72

3.2 Sensor event design....74

3.2.1 Existing issues....74

3.2.2 Why Kafka is the right fit....76

3.2.3 Thought starters on our design....77

3.2.4 User data requirements....78

3.2.5 High-level plan for applying our questions....79

3.2.6 Reviewing our blueprint....82

3.3 Format of your data....82

3.3.1 Plan for data....83

3.3.2 Dependency setup....84

References....89

4 Producers: Sourcing data....91

4.1 An example....92

4.1.1 Producer notes....95

4.2 Producer options....95

4.2.1 Configuring the broker list....96

4.2.2 How to go fast (or go safer)....97

4.2.3 Timestamps....99

4.3 Generating code for our requirements....101

4.3.1 Client and broker versions....109

References....110

5 Consumers: Unlocking data....112

5.1 An example....113

5.1.1 Consumer options....114

5.1.2 Understanding our coordinates....117

5.2 How consumers interact....121

5.3 Tracking....121

5.3.1 Group coordinator....123

5.3.2 Partition assignment strategy....125

5.4 Marking our place....126

5.5 Reading from a compacted topic....128

5.6 Retrieving code for our factory requirements....128

5.6.1 Reading options....128

5.6.2 Requirements....130

References....133

6 Brokers....136

6.1 Introducing the broker....136

6.2 Role of ZooKeeper....137

6.3 Options at the broker level....138

6.3.1 Kafkas other logs: Application logs....140

6.3.2 Server log....140

6.3.3 Managing state....141

6.4 Partition replica leaders and their role....142

6.4.1 Losing data....144

6.5 Peeking into Kafka....145

6.5.1 Cluster maintenance....146

6.5.2 Adding a broker....147

6.5.3 Upgrading your cluster....147

6.5.4 Upgrading your clients....147

6.5.5 Backups....148

6.6 A note on stateful systems....148

6.7 Exercise....150

References....151

7 Topics and partitions....154

7.1 Topics....154

7.1.1 Topic-creation options....157

7.1.2 Replication factors....159

7.2 Partitions....159

7.2.1 Partition location....160

7.2.2 Viewing our logs....161

7.3 Testing with EmbeddedKafkaCluster....162

7.3.1 Using Kafka Testcontainers....163

7.4 Topic compaction....164

References....167

8 Kafka storage....169

8.1 How long to store data....170

8.2 Data movement....171

8.2.1 Keeping the original event....171

8.2.2 Moving away from a batch mindset....171

8.3 Tools....172

8.3.1 Apache Flume....172

8.3.2 Red Hat Debezium....174

8.3.3 Secor....174

8.3.4 Example use case for data storage....175

8.4 Bringing data back into Kafka....176

8.4.1 Tiered storage....177

8.5 Architectures with Kafka....177

8.5.1 Lambda architecture....178

8.5.2 Kappa architecture....179

8.6 Multiple cluster setups....180

8.6.1 Scaling by adding clusters....180

8.7 Cloud- and container-based storage options....180

8.7.1 Kubernetes clusters....181

References....181

9 Management: Tools and logging....183

9.1 Administration clients....184

9.1.1 Administration in code with AdminClient....184

9.1.2 kcat....186

9.1.3 Confluent REST Proxy API....187

9.2 Running Kafka as a systemd service....188

9.3 Logging....189

9.3.1 Kafka application logs....189

9.3.2 ZooKeeper logs....191

9.4 Firewalls....191

9.4.1 Advertised listeners....192

9.5 Metrics....192

9.5.1 JMX console....192

9.6 Tracing option....195

9.6.1 Producer logic....196

9.6.2 Consumer logic....197

9.6.3 Overriding clients....198

9.7 General monitoring tools....199

References....201

Part 3 Going further....204

10 Protecting Kafka....206

10.1 Security basics....208

10.1.1 Encryption with SSL....208

10.1.2 SSL between brokers and clients....209

10.1.3 SSL between brokers....212

10.2 Kerberos and the Simple Authentication and Security Layer (SASL)....212

10.3 Authorization in Kafka....214

10.3.1 Access control lists (ACLs)....214

10.3.2 Role-based access control (RBAC)....215

10.4 ZooKeeper....216

10.4.1 Kerberos setup....216

10.5 Quotas....216

10.5.1 Network bandwidth quota....217

10.5.2 Request rate quotas....218

10.6 Data at rest....219

10.6.1 Managed options....219

References....220

11 Schema registry....222

11.1 A proposed Kafka maturity model....223

11.1.1 Level 0....223

11.1.2 Level 1....224

11.1.3 Level 2....224

11.1.4 Level 3....225

11.2 The Schema Registry....225

11.2.1 Installing the Confluent Schema Registry....226

11.2.2 Registry configuration....226

11.3 Schema features....227

11.3.1 REST API....227

11.3.2 Client library....228

11.4 Compatibility rules....230

11.4.1 Validating schema modifications....230

11.5 Alternative to a schema registry....232

References....233

12 Stream processing with Kafka Streams and ksqlDB....234

12.1 Kafka Streams....235

12.1.1 KStreams API DSL....236

12.1.2 KTable API....240

12.1.3 GlobalKTable API....241

12.1.4 Processor API....241

12.1.5 Kafka Streams setup....243

12.2 ksqlDB: An event-streaming database....244

12.2.1 Queries....245

12.2.2 Local development....245

12.2.3 ksqlDB architecture....247

12.3 Going further....248

12.3.1 Kafka Improvement Proposals (KIPs)....248

12.3.2 Kafka projects you can explore....248

12.3.3 Community Slack channel....249

References....249

appendix A Installation....252

A.1 Operating system (OS) requirements....252

A.2 Kafka versions....252

A.3 Installing Kafka on your local machine....252

A.3.1 Prerequisite: Java....253

A.3.2 Prerequisite: ZooKeeper....253

A.3.3 Prerequisite: Kafka download....253

A.3.4 Starting a ZooKeeper server....254

A.3.5 Creating and configuring a cluster by hand....254

A.4 Confluent Platform....256

A.4.1 Confluent command line interface (CLI)....256

A.4.2 Docker....256

A.5 How to work with the book examples....257

A.5.1 Building from the command line....257

A.6 Troubleshooting....258

appendix B Client example....259

B.1 Python Kafka clients....259

B.1.1 Installing Python....259

B.1.2 Python producer example....259

B.1.3 Python consumer....260

B.2 Client testing....261

B.2.1 Unit testing in Java....261

B.2.2 Kafka Testcontainers....262

References....262

index....264

A....264

B....264

C....264

D....265

E....266

F....266

G....266

H....266

I....266

J....266

K....266

L....267

M....267

N....267

O....267

P....267

Q....268

R....268

S....268

T....268

U....269

V....269

W....269

Z....269

Kafka in Action - back....272

Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you’ll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics.

About the technology

Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications.

About the book

Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you’ll explore the most common use cases such as logging and managing streaming data. When you’re done, you’ll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team.

What's inside

Kafka as an event streaming platform
Kafka producers and consumers from Java applications
Kafka as part of a large data project

About the reader

For intermediate Java developers or data engineers. No prior knowledge of Kafka required.

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг