Building Generative AI Services with FastAPI: A Practical Approach to Developing Context-Rich Generative AI Applications

Name: Building Generative AI Services with FastAPI: A Practical Approach to Developing Context-Rich Generative AI Applications
Author: Parandeh Alireza

Building Generative AI Services with FastAPI: A Practical Approach to Developing Context-Rich Generative AI Applications

Автор: Parandeh Alireza

Дата выхода: 2025

Издательство: O’Reilly Media, Inc.

Количество страниц: 531

Размер файла: 4,9 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы

Cover....1

Table of Contents....7

Foreword....13

Preface....15

Objective and Approach....16

Prerequisites....17

Book Structure....17

How to Read This Book....19

Hardware and Software Requirements....20

Conventions Used in This Book....20

Using Code Examples....21

OReilly Online Learning....23

How to Contact Us....23

Acknowledgments....23

Part I. Developing AI Services....25

Chapter 1. Introduction....27

What Is Generative AI?....27

Why Generative AI Services Will Power Future Applications....30

Facilitating the Creative Process....31

Suggesting Contextually Relevant Solutions....33

Personalizing the User Experience....34

Minimizing Delay in Resolving Customer Queries....35

Acting as an Interface to Complex Systems....36

Automating Manual Administrative Tasks....37

Scaling and Democratizing Content Generation....37

How to Build a Generative AI Service....38

Why Build Generative AI Services with FastAPI?....39

What Prevents the Adoption of Generative AI Services....40

Overview of the Capstone Project....41

Summary....42

Chapter 2. Getting Started with FastAPI....43

Introduction to FastAPI....43

Setting Up Your Development Environment....44

Installing Python, FastAPI, and Required Packages....44

Creating a Simple FastAPI Web Server....45

FastAPI Features and Advantages....48

Inspired by Flask Routing Pattern....48

Handling Asynchronous and Synchronous Operations....48

Built-In Support for Background Tasks....49

Custom Middleware and CORS Support....49

Freedom to Customize Any Service Layer....50

Data Validation and Serialization....50

Rich Ecosystem of Plug-Ins....51

Automatic Documentation....52

Dependency Injection System....53

Lifespan Events....55

Security and Authentication Components....56

Bidirectional Web Socket, GraphQL, and Custom Response Support....56

Modern Python and IDE Integration with Sensible Defaults....57

FastAPI Project Structures....57

Flat Structure....58

Nested Structure....59

Modular Structure....60

Progressive Reorganization of Your FastAPI Project....62

OnionLayered Application Design Pattern....63

Comparing FastAPI to Other Python Web Frameworks....68

FastAPI Limitations....71

Inefficient Model Memory Management....71

Limited Number of Threads....71

Restricted to Global Interpreter Lock....71

Lack of Support for Micro-Batch Processing Inference Requests....72

Cannot Efficiently Split AI Workloads Between CPU and GPU....72

Dependency Conflicts....73

Lack of Support for Resource-Intensive AI Workloads....73

Setting Up a Managed Python Environment and Tooling....74

Summary....76

Chapter 3. AI Integration and Model Serving....77

Serving Generative Models....78

Language Models....78

Audio Models....97

Vision Models....103

Video Models....111

3D Models....119

Strategies for Serving Generative AI Models....126

Be Model Agnostic: Swap Models on Every Request....126

Be Compute Efficient: Preload Models with the FastAPI Lifespan....128

Be Lean: Serve Models Externally....131

The Role of Middleware in Service Monitoring....135

Summary....138

Additional References....139

Chapter 4. Implementing Type-Safe AI Services....141

Introduction to Type Safety....142

Implementing Type Safety....145

Type Annotations....145

Using Annotated....148

Dataclasses....149

Pydantic Models....152

How to Use Pydantic....152

Compound Pydantic Models....153

Field Constraints and Validators....154

Custom Field and Model Validators....157

Computed Fields....159

Model Export and Serialization....160

Parsing Environment Variables with Pydantic....161

Dataclasses or Pydantic Models in FastAPI....163

Summary....169

Part II. Communicating with External Systems....171

Chapter 5. Achieving Concurrency in AI Workloads....173

Optimizing GenAI Services for Multiple Users....174

Optimizing for IO Tasks with Asynchronous Programming....181

Synchronous Versus Asynchronous (Async) Execution....182

Async Programming with Model Provider APIs....186

Event Loop and Thread Pool in FastAPI....190

Blocking the Main Server....192

Project: Talk to the Web (Web Scraper)....194

Project: Talk to Documents (RAG)....199

Optimizing Model Serving for Memory- and Compute-Bound AI Inference Tasks....218

Compute-Bound Operations....218

Externalizing Model Serving....219

Managing Long-Running AI Inference Tasks....229

Summary....231

Additional References....232

Chapter 6. Real-Time Communication with Generative Models....233

Web Communication Mechanisms....234

RegularShort Polling....236

Long Polling....237

Server-Sent Events....238

WebSocket....240

Comparing Communication Mechanisms....246

Implementing SSE Endpoints....247

SSE with GET Request....250

SSE with POST Request....256

Implementing WS Endpoints....260

Streaming LLM Outputs with WebSocket....260

Handling WebSocket Exceptions....267

Designing APIs for Streaming....268

Summary....269

Chapter 7. Integrating Databases into AI Services....271

The Role of a Database....272

Database Systems....273

Project: Storing User Conversations with an LLM in a Relational Database....277

Defining ORM Models....279

Creating a Database Engine and Session Management....281

Implementing CRUD Endpoints....284

Repository and Services Design Pattern....288

Managing Database Schemas Changes....294

Storing Data When Working with Real-Time Streams....298

Summary....301

Part III. Securing, Optimizing, Testing, and Deploying AI Services....303

Chapter 8. Authentication and Authorization....305

Authentication and Authorization....306

Authentication Methods....307

Basic Authentication....309

JSON Web Tokens (JWT) Authentication....313

Implementing OAuth Authentication....333

OAuth Authentication with GitHub....336

OAuth2 Flow Types....343

Authorization....346

Authorization Models....347

Role-Based Access Control....348

Relationship-Based Access Control....352

Attribute-Based Access Control....353

Hybrid Authorization Models....354

Summary....358

Chapter 9. Securing AI Services....359

Usage Moderation and Abuse Protection....359

Guardrails....362

Input Guardrails....363

Output Guardrails....367

Guardrail Thresholds....368

Implementing a Moderation Guardrail....368

API Rate Limiting and Throttling....371

Implementing Rate Limits in FastAPI....372

Throttling Real-Time Streams....377

Summary....379

Chapter 10. Optimizing AI Services....381

Optimization Techniques....381

Batch Processing....382

Caching....385

Model Quantization....400

Structured Outputs....405

Prompt Engineering....408

Fine-Tuning....416

Summary....420

Chapter 11. Testing AI Services....421

The Importance of Testing....422

Software Testing....423

Types of Tests....423

The Biggest Challenge in Testing Software....425

Planning Tests....426

Test Dimensions....428

Test Data....429

Test Phases....429

Test Environments....430

Testing Strategies....431

Challenges of Testing GenAI Services....434

Variability of Outputs (Flakiness)....434

Performance and Resource Constraints (Slow and Expensive)....434

Regression....435

Bias....436

Adversarial Attacks....436

Unbound Testing Coverage....437

Project: Implementing Tests for a RAG System....437

Unit Tests....438

Integration Testing....454

End-to-End Testing....463

Summary....468

Chapter 12. Deployment of AI Services....469

Deployment Options....469

Deploying to Virtual Machines....470

Deploying to Serverless Functions....472

Deploying to Managed App Platforms....476

Deploying with Containers....477

Containerization with Docker....479

Docker Architecture....479

Building Docker Images....480

Container Registries....482

Container Filesystem and Docker Layers....484

Docker Storage....486

Docker Networking....494

Enabling GPU Driver....501

Docker Compose....502

Enabling GPU Access in Docker Compose....506

Optimizing Docker Images....507

docker init....514

Summary....515

Afterword....517

Index....519

About the Author....530

Colophon....530

Ready to build production-grade applications with generative AI? This practical guide takes you through designing and deploying AI services using the FastAPI web framework. Learn how to integrate models that process text, images, audio, and video while seamlessly interacting with databases, filesystems, websites, and APIs. Whether you're a web developer, data scientist, or DevOps engineer, this book equips you with the tools to build scalable, real-time AI applications.

Author Alireza Parandeh provides clear explanations and hands-on examples covering authentication, concurrency, caching, and retrieval-augmented generation (RAG) with vector databases. You'll also explore best practices for testing AI outputs, optimizing performance, and securing microservices. With containerized deployment using Docker, you'll be ready to launch AI-powered applications confidently in the cloud.

Build generative AI services that interact with databases, filesystems, websites, and APIs
Manage concurrency in AI workloads and handle long-running tasks
Stream AI-generated outputs in real time via WebSocket and server-sent events
Secure services with authentication, content filtering, throttling, and rate limiting
Optimize AI performance with caching, batch processing, and fine-tuning techniques

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг