Cover....1
FM....2
Table of Contents....10
Preface....16
Chapter 1: Getting Started with Generative AI....20
Technical requirements....21
Defining the terminology....21
The generative AI stack....22
Python and GenAI....23
OpenAI API....24
MongoDB with Vector Search....25
Important features of generative AI....26
Why use generative AI?....27
The ethics and risks of GenAI....27
Summary....28
Chapter 2: Building Blocks of Intelligent Applications....30
Technical requirements....31
Defining intelligent applications....31
The building blocks of intelligent applications....32
LLMs – reasoning engines for intelligent apps....32
Use cases for LLM reasoning engines....33
Diverse capabilities of LLMs....33
Multi-modal language models....34
A paradigm shift in AI development....35
Embedding models and vector databases – semantic long-term memory....35
Embedding models....35
Vector databases....36
Model hosting....37
Your (soon-to-be) intelligent app....38
Sample application – RAG chatbot....39
Implications of intelligent applications for software engineering....42
Summary....42
Part 1....44
Foundations of AI: LLMs, Embedding Models, Vector Databases, and Application Design....44
Chapter 3: Large Language Models....46
Technical requirements....47
Probabilistic framework....47
n-gram language models....49
Machine learning for language modelling....51
Artificial neural networks....51
Training an artificial neural network....53
ANNs for natural language processing....55
Tokenization....55
Embedding....56
Predicting probability distributions....58
Dealing with sequential data....59
Recurrent neural networks....60
Transformer architecture....61
LLMs in practice....63
The evolving field of LLMs....63
Prompting, fine-tuning, and RAG....63
Summary....64
Chapter 4: Embedding Models....66
Technical requirements....67
What is an embedding model?....68
How do embedding models differ from LLMs?....69
When to use embedding models versus LLMs....70
Types of embedding models....70
Choosing embedding models....74
Task requirements....75
Dataset characteristics....75
Computational resources....75
Vector representations....76
Embedding model leaderboards....78
Embedding models overview....78
Do you always need an embedding model?....79
Executing code from LangChain....80
Best practices....83
Summary....83
Chapter 5: Vector Databases....84
Technical requirements....85
What is a vector embedding?....85
Vector similarity....86
Exact versus approximate search....87
Measuring search....88
Graph connectivity....88
Navigable small worlds....89
How to search a navigable small world....90
Hierarchical navigable small worlds....91
The need for vector databases....93
How vector search enhances AI models....94
Case studies and real-world applications....95
Okta – natural language access request (semantic search)....95
One AI – language-based AI (RAG over business data)....96
Novo Nordisk – automatic clinical study generation (advanced RAG/RPA)....97
Vector search best practices....98
Data modeling....98
Deployment....107
Summary....108
Chapter 6: AI/ML Application Design....110
Technical requirements....111
Data modeling....111
Enriching data with embeddings....112
Considering search use cases....114
Data storage....118
Determining the type of database cluster....118
Determining IOPS....119
Determining RAM....120
Final cluster configuration....121
Performance and availability versus cost....122
Data flow....122
Handling static data sources....122
Storing operational data enriched with vector embeddings....123
Freshness and retention....127
Real-time updates....127
Data lifecycle....128
Adopting new embedding models....129
Security and RBAC....130
Best practices for AI/ML application design....131
Summary....132
Part 2....134
Building Your Python Application: Frameworks, Libraries, APIs, and Vector Search....134
Chapter 7: Useful Frameworks, Libraries, and APIs....136
Technical requirements....137
Python for AI/ML....137
AI/ML frameworks....138
LangChain....139
LangChain semantic search with score....143
Semantic search with pre-filtering....144
Implementing a basic RAG solution with LangChain....145
LangChain prompt templates and chains....146
Key Python libraries....147
pandas....147
PyMongoArrow....150
PyTorch....152
AI/ML APIs....153
OpenAI API....154
Hugging Face....155
Summary....159
Chapter 8: Implementing Vector Search in AI Applications....160
Technical requirements....161
Information retrieval with MongoDB Atlas Vector Search....162
Vector search tutorial in Python....162
Vector Search tutorial with LangChain....168
Building RAG architecture systems....169
Chunking or document-splitting strategies....171
Simple RAG....173
Advanced RAG....176
Summary....186
Part 3....188
Optimizing AI Applications: Scaling, Fine-Tuning, Troubleshooting, Monitoring, and Analytics....188
Chapter 9: LLM Output Evaluation....190
Technical requirements....191
What is LLM evaluation?....191
Component and end-to-end evaluations....192
Model benchmarking....195
Evaluation datasets....196
Defining a baseline....198
User feedback....198
Synthetic data....199
Evaluation metrics....200
Assertion-based metrics....200
Statistical metrics....203
LLM-as-a-judge evaluations....206
RAG metrics....211
Human review....219
Evaluations as guardrails....220
Summary....220
Chapter 10: Refining the Semantic Data Model to Improve Accuracy....222
Technical requirements....223
Embeddings....223
Experimenting with different embedding models....223
Fine-tuning embedding models....227
Embedding metadata....229
Formatting metadata....232
Including static metadata....237
Extracting metadata programmatically....237
Generating metadata with LLMs....238
Including metadata with query embedding and ingested content embeddings....240
Optimizing retrieval-augmented generation....242
Query mutation....242
Extracting query metadata for pre-filtering....243
Formatting ingested data....246
Advanced retrieval systems....248
Summary....249
Chapter 11: Common Failures of Generative AI....250
Technical requirements....251
Hallucinations....251
Causes of hallucinations....251
Implications of hallucinations....253
Sycophancy....253
Causes of sycophancy....254
Implications of sycophancy....255
Data leakage....256
Causes of data leakage....256
Implications of data leakage....258
Cost....259
Types of costs....259
Tokens....260
Performance issues in generative AI applications....262
Computational load....263
Model serving strategies....264
High I/O operations....265
Summary....265
Chapter 12: Correcting and Optimizing Your Generative AI Application....266
Technical requirements....267
Baselining....267
Training and evaluation datasets....268
Few-shot prompting....271
Retrieval and reranking....273
Late interaction strategies....274
Query rewriting....275
Testing and red teaming....276
Testing....276
Red teaming....278
Information post-processing....279
Other remedies....280
Summary....281
Appendix: Further Reading....282
Index....288
Other Books You May Enjoy....295
Master retrieval-augmented generation architecture and fine-tune your AI stack, along with discovering real-world use cases and best practices to create powerful AI apps
The era of generative AI is upon us, and this book serves as a roadmap to harness its full potential. With its help, you’ll learn the core components of the AI stack: large language models (LLMs), vector databases, and Python frameworks, and see how these technologies work together to create intelligent applications.
The chapters will help you discover best practices for data preparation, model selection, and fine-tuning, and teach you advanced techniques such as retrieval-augmented generation (RAG) to overcome common challenges, such as hallucinations and data leakage. You’ll get a solid understanding of vector databases, implement effective vector search strategies, refine models for accuracy, and optimize performance to achieve impactful results. You’ll also identify and address AI failures to ensure your applications deliver reliable and valuable results. By evaluating and improving the output of LLMs, you’ll be able to enhance their performance and relevance.
By the end of this book, you’ll be well-equipped to build sophisticated AI applications that deliver real-world value.
This book is for software engineers and developers looking to build intelligent applications using generative AI. While the book is suitable for beginners, a basic understanding of Python programming is required to make the most of it.