Cover....1
Copyright....8
Table of Contents....9
Preface....15
Conventions Used in This Book....16
Using Code Examples....17
O’Reilly Online Learning....18
How to Contact Us....18
Acknowledgments....19
Chris....19
Antje....19
Shelbee....19
Chapter 1. Generative AI Use Cases, Fundamentals, and Project Life Cycle....21
Use Cases and Tasks....21
Foundation Models and Model Hubs....24
Generative AI Project Life Cycle....25
Generative AI on AWS....28
Why Generative AI on AWS?....31
Building Generative AI Applications on AWS....32
Summary....33
Chapter 2. Prompt Engineering and In-Context Learning....35
Prompts and Completions....35
Tokens....36
Prompt Engineering....36
Prompt Structure....38
Instruction....38
Context....38
In-Context Learning with Few-Shot Inference....40
Zero-Shot Inference....41
One-Shot Inference....41
Few-Shot Inference....42
In-Context Learning Gone Wrong....43
In-Context Learning Best Practices....43
Prompt-Engineering Best Practices....44
Inference Configuration Parameters....49
Summary....54
Chapter 3. Large-Language Foundation Models....55
Large-Language Foundation Models....56
Tokenizers....57
Embedding Vectors....58
Transformer Architecture....60
Inputs and Context Window....62
Embedding Layer....62
Encoder....62
Self-Attention....62
Decoder....64
Softmax Output....64
Types of Transformer-Based Foundation Models....66
Pretraining Datasets....68
Scaling Laws....69
Compute-Optimal Models....71
Summary....72
Chapter 4. Memory and Compute Optimizations....75
Memory Challenges....75
Data Types and Numerical Precision....78
Quantization....79
fp16....80
bfloat16....82
fp8....84
int8....84
Optimizing the Self-Attention Layers....86
FlashAttention....87
Grouped-Query Attention....87
Distributed Computing....88
Distributed Data Parallel....89
Fully Sharded Data Parallel....90
Performance Comparison of FSDP over DDP....92
Distributed Computing on AWS....94
Fully Sharded Data Parallel with Amazon SageMaker....95
AWS Neuron SDK and AWS Trainium....97
Summary....97
Chapter 5. Fine-Tuning and Evaluation....99
Instruction Fine-Tuning....100
Llama 2-Chat....100
Falcon-Chat....100
FLAN-T5....100
Instruction Dataset....101
Multitask Instruction Dataset....101
FLAN: Example Multitask Instruction Dataset....102
Prompt Template....103
Convert a Custom Dataset into an Instruction Dataset....104
Instruction Fine-Tuning....106
Amazon SageMaker Studio....107
Amazon SageMaker JumpStart....108
Amazon SageMaker Estimator for Hugging Face....109
Evaluation....110
Evaluation Metrics....111
Benchmarks and Datasets....112
Summary....114
Chapter 6. Parameter-Efficient Fine-Tuning....115
Full Fine-Tuning Versus PEFT....116
LoRA and QLoRA....118
LoRA Fundamentals....119
Rank....120
Target Modules and Layers....120
Applying LoRA....121
Merging LoRA Adapter with Original Model....123
Maintaining Separate LoRA Adapters....124
Full-Fine Tuning Versus LoRA Performance....124
QLoRA....125
Prompt Tuning and Soft Prompts....126
Summary....129
Chapter 7. Fine-Tuning with Reinforcement Learning from Human Feedback....131
Human Alignment: Helpful, Honest, and Harmless....132
Reinforcement Learning Overview....132
Train a Custom Reward Model....135
Collect Training Dataset with Human-in-the-Loop....135
Sample Instructions for Human Labelers....136
Using Amazon SageMaker Ground Truth for Human Annotations....136
Prepare Ranking Data to Train a Reward Model....138
Train the Reward Model....141
Existing Reward Model: Toxicity Detector by Meta....143
Fine-Tune with Reinforcement Learning from Human Feedback....144
Using the Reward Model with RLHF....145
Proximal Policy Optimization RL Algorithm....146
Perform RLHF Fine-Tuning with PPO....146
Mitigate Reward Hacking....148
Using Parameter-Efficient Fine-Tuning with RLHF....150
Evaluate RLHF Fine-Tuned Model....151
Qualitative Evaluation....151
Quantitative Evaluation....152
Load Evaluation Model....153
Define Evaluation-Metric Aggregation Function....153
Compare Evaluation Metrics Before and After....154
Summary....155
Chapter 8. Model Deployment Optimizations....157
Model Optimizations for Inference....157
Pruning....159
Post-Training Quantization with GPTQ....160
Distillation....162
Large Model Inference Container....164
AWS Inferentia: Purpose-Built Hardware for Inference....165
Model Update and Deployment Strategies....167
A/B Testing....168
Shadow Deployment....169
Metrics and Monitoring....171
Autoscaling....172
Autoscaling Policies....172
Define an Autoscaling Policy....173
Summary....174
Chapter 9. Context-Aware Reasoning Applications Using RAG and Agents....175
Large Language Model Limitations....176
Hallucination....177
Knowledge Cutoff....177
Retrieval-Augmented Generation....178
External Sources of Knowledge....179
RAG Workflow....180
Document Loading....181
Chunking....182
Document Retrieval and Reranking....183
Prompt Augmentation....184
RAG Orchestration and Implementation....185
Document Loading and Chunking....186
Embedding Vector Store and Retrieval....188
Retrieval Chains....191
Reranking with Maximum Marginal Relevance....193
Agents....194
ReAct Framework....196
Program-Aided Language Framework....198
Generative AI Applications....201
FMOps: Operationalizing the Generative AI Project Life Cycle....207
Experimentation Considerations....208
Development Considerations....210
Production Deployment Considerations....212
Summary....213
Chapter 10. Multimodal Foundation Models....215
Use Cases....216
Multimodal Prompt Engineering Best Practices....217
Image Generation and Enhancement....218
Image Generation....218
Image Editing and Enhancement....219
Inpainting, Outpainting, Depth-to-Image....224
Inpainting....224
Outpainting....226
Depth-to-Image....227
Image Captioning and Visual Question Answering....229
Image Captioning....231
Content Moderation....231
Visual Question Answering....231
Model Evaluation....236
Text-to-Image Generative Tasks....236
Forward Diffusion....239
Nonverbal Reasoning....239
Diffusion Architecture Fundamentals....241
Forward Diffusion....241
Reverse Diffusion....242
U-Net....243
Stable Diffusion 2 Architecture....244
Text Encoder....245
U-Net and Diffusion Process....246
Text Conditioning....248
Cross-Attention....248
Scheduler....249
Image Decoder....249
Stable Diffusion XL Architecture....250
U-Net and Cross-Attention....250
Refiner....250
Conditioning....251
Summary....253
Chapter 11. Controlled Generation and Fine-Tuning with Stable Diffusion....255
ControlNet....255
Fine-Tuning....260
DreamBooth....261
DreamBooth and PEFT-LoRA....263
Textual Inversion....265
Human Alignment with Reinforcement Learning from Human Feedback....269
Summary....272
Chapter 12. Amazon Bedrock: Managed Service for Generative AI....273
Bedrock Foundation Models....273
Amazon Titan Foundation Models....274
Stable Diffusion Foundation Models from Stability AI....274
Bedrock Inference APIs....274
Large Language Models....276
Generate SQL Code....277
Summarize Text....277
Embeddings....278
Fine-Tuning....281
Agents....284
Multimodal Models....287
Create Images from Text....287
Create Images from Images....289
Data Privacy and Network Security....290
Governance and Monitoring....292
Summary....292
Index....293
About the Authors....310
Colophon....310
Companies today are moving rapidly to integrate generative AI into their products and services. But there's a great deal of hype (and misunderstanding) about the impact and promise of this technology. With this book, Chris Fregly, Antje Barth, and Shelbee Eigenbrode from AWS help CTOs, ML practitioners, application developers, business analysts, data engineers, and data scientists find practical ways to use this exciting new technology.
You'll learn the generative AI project life cycle including use case definition, model selection, model fine-tuning, retrieval-augmented generation, reinforcement learning from human feedback, and model quantization, optimization, and deployment. And you'll explore different types of models including large language models (LLMs) and multimodal models such as Stable Diffusion for generating images and Flamingo/IDEFICS for answering questions about images.