LLMs in Production: From language models to successful products

LLMs in Production: From language models to successful products

Автор: Brousseau Christopher , Sharp Matt

Дата выхода: 2025

Издательство: Manning Publications Co.

Количество страниц: 456

Размер файла: 4,8 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы Дополнительные материалы

LLMs in Production....1

brief contents....8

contents....9

foreword....14

preface....15

acknowledgments....17

about the book....19

Who should read this book....19

How this book is organized....20

About the code....21

liveBook Discussion Forum....21

about the authors....22

about the cover illustration....23

1 Words’ awakening: Why large language models have captured attention....24

1.1 Large language models accelerating communication....26

1.2 Navigating the build-and-buy decision with LLMs....30

1.2.1 Buying: The beaten path....31

1.2.2 Building: The path less traveled....32

1.2.3 A word of warning: Embrace the future now....38

1.3 Debunking myths....39

Summary....42

2 Large language models: A deep dive into language modeling....43

2.1 Language modeling....44

2.1.1 Linguistic features....46

2.1.2 Semiotics....52

2.1.3 Multilingual NLP....55

2.2 Language modeling techniques....56

2.2.1 N-gram and corpus-based techniques....57

2.2.2 Bayesian techniques....59

2.2.3 Markov chains....63

2.2.4 Continuous language modeling....66

2.2.5 Embeddings....70

2.2.6 Multilayer perceptrons....72

2.2.7 Recurrent neural networks and long short-term memory networks....74

2.2.8 Attention....81

2.3 Attention is all you need....83

2.3.1 Encoders....84

2.3.2 Decoders....85

2.3.3 Transformers....87

2.4 Really big transformers....89

Summary....94

3 Large language model operations: Building a platform for LLMs....96

3.1 Introduction to large language model operations....96

3.2 Operations challenges with large language models....97

3.2.1 Long download times....97

3.2.2 Longer deploy times....98

3.2.3 Latency....99

3.2.4 Managing GPUs....100

3.2.5 Peculiarities of text data....100

3.2.6 Token limits create bottlenecks....101

3.2.7 Hallucinations cause confusion....103

3.2.8 Bias and ethical considerations....104

3.2.9 Security concerns....104

3.2.10 Controlling costs....107

3.3 LLMOps essentials....107

3.3.1 Compression....107

3.3.2 Distributed computing....116

3.4 LLM operations infrastructure....122

3.4.1 Data infrastructure....124

3.4.2 Experiment trackers....125

3.4.3 Model registry....126

3.4.4 Feature stores....127

3.4.5 Vector databases....128

3.4.6 Monitoring system....129

3.4.7 GPU-enabled workstations....130

3.4.8 Deployment service....131

Summary....132

4 Data engineering for large language models: Setting up for success....134

4.1 Models are the foundation....135

4.1.1 GPT....136

4.1.2 BLOOM....137

4.1.3 LLaMA....138

4.1.4 Wizard....138

4.1.5 Falcon....139

4.1.6 Vicuna....139

4.1.7 Dolly....139

4.1.8 OpenChat....140

4.2 Evaluating LLMs....141

4.2.1 Metrics for evaluating text....141

4.2.2 Industry benchmarks....144

4.2.3 Responsible AI benchmarks....149

4.2.4 Developing your own benchmark....151

4.2.5 Evaluating code generators....153

4.2.6 Evaluating model parameters....154

4.3 Data for LLMs....156

4.3.1 Datasets you should know....157

4.3.2 Data cleaning and preparation....161

4.4 Text processors....167

4.4.1 Tokenization....167

4.4.2 Embeddings....172

4.5 Preparing a Slack dataset....175

Summary....176

5 Training large language models: How to generate the generator....177

5.1 Multi-GPU environments....178

5.1.1 Setting up....178

5.1.2 Libraries....182

5.2 Basic training techniques....184

5.2.1 From scratch....185

5.2.2 Transfer learning (finetuning)....192

5.2.3 Prompting....197

5.3 Advanced training techniques....198

5.3.1 Prompt tuning....198

5.3.2 Finetuning with knowledge distillation....204

5.3.3 Reinforcement learning with human feedback....208

5.3.4 Mixture of experts....211

5.3.5 LoRA and PEFT....214

5.4 Training tips and tricks....219

5.4.1 Training data size notes....219

5.4.2 Efficient training....220

5.4.3 Local minima traps....221

5.4.4 Hyperparameter tuning tips....221

5.4.5 A note on operating systems....222

5.4.6 Activation function advice....222

Summary....223

6 Large language model services: A practical guide....224

6.1 Creating an LLM service....225

6.1.1 Model compilation....226

6.1.2 LLM storage strategies....232

6.1.3 Adaptive request batching....235

6.1.4 Flow control....235

6.1.5 Streaming responses....238

6.1.6 Feature store....239

6.1.7 Retrieval-augmented generation....242

6.1.8 LLM service libraries....246

6.2 Setting up infrastructure....247

6.2.1 Provisioning clusters....248

6.2.2 Autoscaling....250

6.2.3 Rolling updates....255

6.2.4 Inference graphs....257

6.2.5 Monitoring....260

6.3 Production challenges....263

6.3.1 Model updates and retraining....264

6.3.2 Load testing....264

6.3.3 Troubleshooting poor latency....268

6.3.4 Resource management....270

6.3.5 Cost engineering....271

6.3.6 Security....272

6.4 Deploying to the edge....274

Summary....276

7 Prompt engineering: Becoming an LLM whisperer....277

7.1 Prompting your model....278

7.1.1 Few-shot prompting....278

7.1.2 One-shot prompting....280

7.1.3 Zero-shot prompting....281

7.2 Prompt engineering basics....283

7.2.1 Anatomy of a prompt....284

7.2.2 Prompting hyperparameters....286

7.2.3 Scrounging the training data....288

7.3 Prompt engineering tooling....289

7.3.1 LangChain....289

7.3.2 Guidance....290

7.3.3 DSPy....293

7.3.4 Other tooling is available but . . .....294

7.4 Advanced prompt engineering techniques....294

7.4.1 Giving LLMs tools....294

7.4.2 ReAct....297

Summary....300

8 Large language model applications: Building an interactive experience....302

8.1 Building an application....303

8.1.1 Streaming on the frontend....304

8.1.2 Keeping a history....307

8.1.3 Chatbot interaction features....310

8.1.4 Token counting....313

8.1.5 RAG applied....314

8.2 Edge applications....316

8.3 LLM agents....319

Summary....327

9 Creating an LLM project: Reimplementing Llama 3....328

9.1 Implementing Meta’s Llama....329

9.1.1 Tokenization and configuration....329

9.1.2 Dataset, data loading, evaluation, and generation....332

9.1.3 Network architecture....337

9.2 Simple Llama....340

9.3 Making it better....344

9.3.1 Quantization....345

9.3.2 LoRA....346

9.3.3 Fully sharded data parallel–quantized LoRA....349

9.4 Deploy to a Hugging Face Hub Space....351

Summary....354

10 Creating a coding copilot project: This would have helped you earlier....355

10.1 Our model....356

10.2 Data is king....359

10.2.1 Our VectorDB....359

10.2.2 Our dataset....360

10.2.3 Using RAG....364

10.3 Build the VS Code extension....367

10.4 Lessons learned and next steps....374

Summary....377

11 Deploying an LLM on a Raspberry Pi: How low can you go?....378

11.1 Setting up your Raspberry Pi....379

11.1.1 Pi Imager....380

11.1.2 Connecting to Pi....382

11.1.3 Software installations and updates....386

11.2 Preparing the model....387

11.3 Serving the model....389

11.4 Improvements....391

11.4.1 Using a better interface....391

11.4.2 Changing quantization....392

11.4.3 Adding multimodality....393

11.4.4 Serving the model on Google Colab....397

Summary....400

12 Production, an ever-changing landscape: Things are just getting started....402

12.1 A thousand-foot view....403

12.2 The future of LLMs....404

12.2.1 Government and regulation....404

12.2.2 LLMs are getting bigger....409

12.2.3 Multimodal spaces....415

12.2.4 Datasets....416

12.2.5 Solving hallucination....417

12.2.6 New hardware....424

12.2.7 Agents will become useful....425

12.3 Final thoughts....429

Summary....430

appendix A History of linguistics....431

A.1 Ancient linguistics....431

A.2 Medieval linguistics....432

A.3 Renaissance and early modern linguistics....433

A.4 Early 20th-century linguistics....435

A.5 Mid-20th century and modern linguistics....437

appendix B Reinforcement learning with human feedback....439

appendix C Multimodal latent spaces....443

index....450

Symbols....450

Numerics....450

A....450

B....450

C....450

D....451

E....451

F....451

G....451

H....452

I....452

J....452

K....452

L....452

M....453

N....453

O....454

P....454

Q....454

R....454

S....454

T....455

U....455

V....455

W....455

X....455

Z....455

This practical book offers clear, example-rich explanations of how LLMs work, how you can interact with them, and how to integrate LLMs into your own applications. Find out what makes LLMs so different from traditional software and ML, discover best practices for working with them out of the lab, and dodge common pitfalls with experienced advice.

In LLMs in Production you will:

Grasp the fundamentals of LLMs and the technology behind them
Evaluate when to use a premade LLM and when to build your own
Efficiently scale up an ML platform to handle the needs of LLMs
Train LLM foundation models and finetune an existing LLM
Deploy LLMs to the cloud and edge devices using complex architectures like PEFT and LoRA
Build applications leveraging the strengths of LLMs while mitigating their weaknesses

About the technology

Most business software is developed and improved iteratively, and can change significantly even after deployment. By contrast, because LLMs are expensive to create and difficult to modify, they require meticulous upfront planning, exacting data standards, and carefully-executed technical implementation. Integrating LLMs into production products impacts every aspect of your operations plan, including the application lifecycle, data pipeline, compute cost, security, and more. Get it wrong, and you may have a costly failure on your hands.

About the book

LLMs in Production teaches you how to develop an LLMOps plan that can take an AI app smoothly from design to delivery. You’ll learn techniques for preparing an LLM dataset, cost-efficient training hacks like LORA and RLHF, and industry benchmarks for model evaluation. Along the way, you’ll put your new skills to use in three exciting example projects: creating and training a custom LLM, building a VSCode AI coding extension, and deploying a small model to a Raspberry Pi.

What's inside

Balancing cost and performance
Retraining and load testing
Optimizing models for commodity hardware
Deploying on a Kubernetes cluster

About the reader

For data scientists and ML engineers who know Python and the basics of cloud deployment.

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг