Cover....1
Copyright....6
Table of Contents....7
Foreword....13
Preface....15
Objective and Approach....16
Prerequisites....17
Book Structure....17
How to Read This Book....19
Hardware and Software Requirements....20
Conventions Used in This Book....20
Using Code Examples....21
OReilly Online Learning....23
How to Contact Us....23
Acknowledgments....23
Part I. Developing AI Services....25
Chapter 1. Introduction....27
What Is Generative AI?....27
Why Generative AI Services Will Power Future Applications....30
Facilitating the Creative Process....31
Suggesting Contextually Relevant Solutions....33
Personalizing the User Experience....34
Minimizing Delay in Resolving Customer Queries....35
Acting as an Interface to Complex Systems....36
Automating Manual Administrative Tasks....37
Scaling and Democratizing Content Generation....37
How to Build a Generative AI Service....38
Why Build Generative AI Services with FastAPI?....39
What Prevents the Adoption of Generative AI Services....40
Overview of the Capstone Project....41
Summary....42
Chapter 2. Getting Started with FastAPI....43
Introduction to FastAPI....43
Setting Up Your Development Environment....44
Installing Python, FastAPI, and Required Packages....44
Creating a Simple FastAPI Web Server....45
FastAPI Features and Advantages....48
Inspired by Flask Routing Pattern....48
Handling Asynchronous and Synchronous Operations....48
Built-In Support for Background Tasks....49
Custom Middleware and CORS Support....49
Freedom to Customize Any Service Layer....50
Data Validation and Serialization....50
Rich Ecosystem of Plug-Ins....51
Automatic Documentation....52
Dependency Injection System....53
Lifespan Events....55
Security and Authentication Components....56
Bidirectional Web Socket, GraphQL, and Custom Response Support....56
Modern Python and IDE Integration with Sensible Defaults....57
FastAPI Project Structures....57
Flat Structure....58
Nested Structure....59
Modular Structure....60
Progressive Reorganization of Your FastAPI Project....62
OnionLayered Application Design Pattern....63
Comparing FastAPI to Other Python Web Frameworks....68
FastAPI Limitations....71
Inefficient Model Memory Management....71
Limited Number of Threads....71
Restricted to Global Interpreter Lock....71
Lack of Support for Micro-Batch Processing Inference Requests....72
Cannot Efficiently Split AI Workloads Between CPU and GPU....72
Dependency Conflicts....73
Lack of Support for Resource-Intensive AI Workloads....73
Setting Up a Managed Python Environment and Tooling....74
Summary....76
Chapter 3. AI Integration and Model Serving....77
Serving Generative Models....78
Language Models....78
Audio Models....97
Vision Models....103
Video Models....111
3D Models....119
Strategies for Serving Generative AI Models....126
Be Model Agnostic: Swap Models on Every Request....126
Be Compute Efficient: Preload Models with the FastAPI Lifespan....128
Be Lean: Serve Models Externally....131
The Role of Middleware in Service Monitoring....135
Summary....138
Additional References....139
Chapter 4. Implementing Type-Safe AI Services....141
Introduction to Type Safety....142
Implementing Type Safety....145
Type Annotations....145
Using Annotated....148
Dataclasses....149
Pydantic Models....152
How to Use Pydantic....152
Compound Pydantic Models....153
Field Constraints and Validators....154
Custom Field and Model Validators....157
Computed Fields....159
Model Export and Serialization....160
Parsing Environment Variables with Pydantic....161
Dataclasses or Pydantic Models in FastAPI....163
Summary....169
Part II. Communicating with External Systems....171
Chapter 5. Achieving Concurrency in AI Workloads....173
Optimizing GenAI Services for Multiple Users....174
Optimizing for IO Tasks with Asynchronous Programming....181
Synchronous Versus Asynchronous (Async) Execution....182
Async Programming with Model Provider APIs....186
Event Loop and Thread Pool in FastAPI....190
Blocking the Main Server....192
Project: Talk to the Web (Web Scraper)....194
Project: Talk to Documents (RAG)....199
Optimizing Model Serving for Memory- and Compute-Bound AI Inference Tasks....218
Compute-Bound Operations....218
Externalizing Model Serving....219
Managing Long-Running AI Inference Tasks....229
Summary....231
Additional References....232
Chapter 6. Real-Time Communication with Generative Models....233
Web Communication Mechanisms....234
RegularShort Polling....236
Long Polling....237
Server-Sent Events....238
WebSocket....240
Comparing Communication Mechanisms....246
Implementing SSE Endpoints....247
SSE with GET Request....250
SSE with POST Request....256
Implementing WS Endpoints....260
Streaming LLM Outputs with WebSocket....260
Handling WebSocket Exceptions....267
Designing APIs for Streaming....268
Summary....269
Chapter 7. Integrating Databases into AI Services....271
The Role of a Database....272
Database Systems....273
Project: Storing User Conversations with an LLM in a Relational Database....277
Defining ORM Models....279
Creating a Database Engine and Session Management....281
Implementing CRUD Endpoints....284
Repository and Services Design Pattern....288
Managing Database Schemas Changes....294
Storing Data When Working with Real-Time Streams....298
Summary....301
Part III. Securing, Optimizing, Testing, and Deploying AI Services....303
Chapter 8. Authentication and Authorization....305
Authentication and Authorization....306
Authentication Methods....307
Basic Authentication....309
JSON Web Tokens (JWT) Authentication....313
Implementing OAuth Authentication....333
OAuth Authentication with GitHub....336
OAuth2 Flow Types....343
Authorization....346
Authorization Models....347
Role-Based Access Control....348
Relationship-Based Access Control....352
Attribute-Based Access Control....353
Hybrid Authorization Models....354
Summary....358
Chapter 9. Securing AI Services....359
Usage Moderation and Abuse Protection....359
Guardrails....362
Input Guardrails....363
Output Guardrails....367
Guardrail Thresholds....368
Implementing a Moderation Guardrail....368
API Rate Limiting and Throttling....371
Implementing Rate Limits in FastAPI....372
Throttling Real-Time Streams....377
Summary....379
Chapter 10. Optimizing AI Services....381
Optimization Techniques....381
Batch Processing....382
Caching....385
Model Quantization....400
Structured Outputs....405
Prompt Engineering....408
Fine-Tuning....416
Summary....420
Chapter 11. Testing AI Services....421
The Importance of Testing....422
Software Testing....423
Types of Tests....423
The Biggest Challenge in Testing Software....425
Planning Tests....426
Test Dimensions....428
Test Data....429
Test Phases....429
Test Environments....430
Testing Strategies....431
Challenges of Testing GenAI Services....434
Variability of Outputs (Flakiness)....434
Performance and Resource Constraints (Slow and Expensive)....434
Regression....435
Bias....436
Adversarial Attacks....436
Unbound Testing Coverage....437
Project: Implementing Tests for a RAG System....437
Unit Tests....438
Integration Testing....454
End-to-End Testing....463
Summary....468
Chapter 12. Deployment of AI Services....469
Deployment Options....469
Deploying to Virtual Machines....470
Deploying to Serverless Functions....472
Deploying to Managed App Platforms....476
Deploying with Containers....477
Containerization with Docker....479
Docker Architecture....479
Building Docker Images....480
Container Registries....482
Container Filesystem and Docker Layers....484
Docker Storage....486
Docker Networking....494
Enabling GPU Driver....501
Docker Compose....502
Enabling GPU Access in Docker Compose....506
Optimizing Docker Images....507
docker init....514
Summary....515
Afterword....517
Index....519
About the Author....530
Colophon....530
Ready to build production-grade applications with generative AI? This practical guide takes you through designing and deploying AI services using the FastAPI web framework. Learn how to integrate models that process text, images, audio, and video while seamlessly interacting with databases, filesystems, websites, and APIs. Whether you're a web developer, data scientist, or DevOps engineer, this book equips you with the tools to build scalable, real-time AI applications.
Author Alireza Parandeh provides clear explanations and hands-on examples covering authentication, concurrency, caching, and retrieval-augmented generation (RAG) with vector databases. You'll also explore best practices for testing AI outputs, optimizing performance, and securing microservices. With containerized deployment using Docker, you'll be ready to launch AI-powered applications confidently in the cloud.