Skip to main content

KB RAG System - Complete Architecture Documentation

Project Overviewโ€‹

A comprehensive RAG (Retrieval-Augmented Generation) knowledge base system with intelligent agent orchestration, combining semantic search, web search capabilities, and AI-powered question answering.

Core Features:

  • ๐Ÿ“š Document Indexing: Automatic conversion of MDX documents to vector embeddings
  • ๐Ÿ” Hybrid Search: Semantic + keyword search with re-ranking
  • ๐Ÿค– Agent Orchestration: Intelligent routing to specialized agents (Knowledge, Web Search, Hybrid)
  • ๐ŸŒ Web Search Fallback: Automatic web search when knowledge base is insufficient
  • ๐Ÿ› ๏ธ Tool Calling: Native support for Gemini function calling API
  • ๐Ÿ“– Citation Tracking: Every answer includes source document references
  • โšก Real-time Response: Optimized retrieval and generation pipeline

Table of Contentsโ€‹

  1. Technology Stack
  2. System Architecture
  3. Data Pipeline
  4. Agent Orchestration
  5. Tool System
  6. API Design
  7. Model Selection
  8. Deployment Guide
  9. Configuration
  10. Troubleshooting

Technology Stackโ€‹

Backend (Python)โ€‹

Core Framework:
- FastAPI 0.110+: High-performance web framework
- Pydantic 2.6+: Data validation and serialization
- Uvicorn: ASGI server

Data Layer:
- PostgreSQL 15+ with pgvector: Vector database
- psycopg 3: Database driver
- psycopg-pool: Connection pool management

AI/ML:
- Gemini API: Google Gemini 2.5 Flash for LLM
- Gemini Embeddings: models/embedding-001
- Tavily/Brave Search: Web search integration
- LangChain: Text processing and chunking

Data Processing:
- PyYAML: Configuration management
- httpx: HTTP client for external APIs
- tqdm: Progress bars

Testing:
- pytest 8.0+: Unit testing
- pytest-asyncio: Async test support

Frontend (TypeScript/Next.js)โ€‹

Framework:
- Docusaurus: Documentation site generator
- React 18+: UI framework
- TypeScript: Type safety

Build Tools:
- npm: Package management
- webpack: Module bundling

Infrastructureโ€‹

Database:
- PostgreSQL 15+
- pgvector extension
- Python 3.11+

Environment Management:
- Doppler: Environment variables and secrets
- uv: Python package management

System Architectureโ€‹

Overall Architectureโ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Frontend (Docusaurus) โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ AI Chat Widget (React) โ”‚ โ”‚
โ”‚ โ”‚ - User input โ”‚ โ”‚
โ”‚ โ”‚ - Display AI responses and citations โ”‚ โ”‚
โ”‚ โ”‚ - SSE streaming support โ”‚ โ”‚
โ”‚ โ”‚ - http://localhost:3001 โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ API Layer (FastAPI) โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ POST /ask Agent-orchestrated Q&A โ”‚ โ”‚
โ”‚ โ”‚ POST /search Semantic search โ”‚ โ”‚
โ”‚ โ”‚ GET /health Health check โ”‚ โ”‚
โ”‚ โ”‚ http://localhost:8000 โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Agent Orchestration Layer โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Agent Router (Question Classifier) โ”‚ โ”‚
โ”‚ โ”‚ - Keyword-based routing โ”‚ โ”‚
โ”‚ โ”‚ - Optional LLM classification โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Knowledge โ”‚ โ”‚ Web Search โ”‚ โ”‚ Hybrid โ”‚ โ”‚
โ”‚ โ”‚ Agent โ”‚ โ”‚ Agent โ”‚ โ”‚ Agent โ”‚ โ”‚
โ”‚ โ”‚ (RAG-based) โ”‚ โ”‚ (Tavily/API) โ”‚ โ”‚ (Combined) โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Business Logic Layer โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ RAG Pipeline โ”‚ โ”‚ Retriever โ”‚ โ”‚
โ”‚ โ”‚ - Context build โ”‚ โ”‚ - Hybrid search โ”‚ โ”‚
โ”‚ โ”‚ - Answer gen โ”‚ โ”‚ - Re-ranking โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Tool System โ”‚ โ”‚
โ”‚ โ”‚ - Web search (Tavily/Brave) โ”‚ โ”‚
โ”‚ โ”‚ - Function calling interface โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Data Access Layer โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ VectorStore โ”‚ โ”‚ DocStore โ”‚ โ”‚
โ”‚ โ”‚ - pgvector โ”‚ โ”‚ - Document metadata โ”‚ โ”‚
โ”‚ โ”‚ - Similarity โ”‚ โ”‚ - Checksum tracking โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Storage Layer โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ PostgreSQL + pgvector โ”‚ โ”‚
โ”‚ โ”‚ - kb_documents: Document metadata โ”‚ โ”‚
โ”‚ โ”‚ - kb_chunks_gemini: Vector embeddings (768-dim) โ”‚ โ”‚
โ”‚ โ”‚ - kb_index_meta: Index signatures โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Data Pipeline (Offline Indexing) โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Stage 1: Document Cleaning โ”‚ โ”‚
โ”‚ โ”‚ - MDX โ†’ JSONL (JavaScript tools) โ”‚ โ”‚
โ”‚ โ”‚ - Remove runtime code โ”‚ โ”‚
โ”‚ โ”‚ - Generate checksums โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Stage 2: Vector Indexing โ”‚ โ”‚
โ”‚ โ”‚ - Text chunking โ”‚ โ”‚
โ”‚ โ”‚ - Embedding generation (Gemini) โ”‚ โ”‚
โ”‚ โ”‚ - Database storage โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Request Flow Diagramโ€‹

User Question
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Agent Router โ”‚
โ”‚ - Classify Q โ”‚
โ”‚ - Select Agent โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ–ผ โ–ผ โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Know. โ”‚ โ”‚ Web โ”‚ โ”‚ Hybrid โ”‚
โ”‚Agent โ”‚ โ”‚ Agent โ”‚ โ”‚ Agent โ”‚
โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Tavily/ โ”‚ โ”‚
โ”‚ โ”‚ Brave โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Retriever โ”‚
โ”‚ - Hybrid โ”‚
โ”‚ - Re-rank โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Context Builderโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ LLM Call โ”‚
โ”‚ (Gemini) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Response โ”‚
โ”‚ + Citations โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
Return to User

Data Pipelineโ€‹

Pipeline Stagesโ€‹

Stage 1: Document Cleaningโ€‹

Tool: JavaScript + mdx-clean

Input: MDX source files

docs/
โ”œโ”€โ”€ cs/
โ”‚ โ”œโ”€โ”€ algorithms/
โ”‚ โ””โ”€โ”€ ...
โ”œโ”€โ”€ ai/
โ”‚ โ”œโ”€โ”€ agents/
โ”‚ โ””โ”€โ”€ ...
โ””โ”€โ”€ ...

Processing Steps:

  1. Read MDX files: Parse frontmatter and content
  2. Remove runtime code:
    • Remove import/export statements
    • Remove JSX syntax
    • Preserve markdown content
  3. Transform special syntax:
    • TabItems โ†’ headings
    • Preserve code blocks
    • Preserve Mermaid diagrams
  4. Generate metadata:
    • Document ID
    • Title
    • Path
    • SHA-256 checksum (for incremental updates)
  5. Output JSONL: kb/data/cleaned/docs.jsonl

Output Format:

{
"id": "ai/agentops",
"path": "docs/ai/agents/agentops/index.mdx",
"title": "AgentOps and Security",
"checksum": "6c5bb14e0a5801d7fb4fb4431ef3e58e8c8cf6b19bab56970589111a4007625b",
"content": "# AgentOps and Security\n\nAgentOps combines...",
"frontmatter": {
"title": "AgentOps and Security",
"tags": ["agents", "security"]
}
}

CLI Commands:

# Run Stage 1 only
kb-build --stage clean

# Specify input/output
kb-build --stage clean --docs-dir ./docs --output kb/data/cleaned/custom.jsonl

Stage 2: Vector Indexingโ€‹

Tool: Python + Gemini Embeddings

Input: kb/data/cleaned/docs.jsonl

Processing Flow:

1. Text Chunking

# Strategy: MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter

Configuration:
- max_section_chars: 2000 # Max chars before recursive split
- chunk_size: 500 # Target chunk size
- chunk_overlap: 80 # Overlap between chunks

Preserve:
- Heading hierarchy (H1, H2, H3...)
- Section structure
- Paragraph content

2. Embedding Generation

# Using Gemini Embeddings API

Model: models/embedding-001
API: https://generativelanguage.googleapis.com/v1beta/

Batch requests:
- batch_size: 32 chunks/request
- Auto-retry mechanism
- Progress bar display

Output: 768-dimensional vectors

3. Database Storage

-- Document metadata table
CREATE TABLE kb_documents (
doc_id VARCHAR(255) PRIMARY KEY,
path VARCHAR(1024) NOT NULL,
title VARCHAR(512) NOT NULL,
version VARCHAR(64) DEFAULT 'latest',
checksum VARCHAR(64) NOT NULL,
chunk_ids JSONB DEFAULT '[]',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Vector embeddings table
CREATE TABLE kb_chunks_gemini (
id SERIAL PRIMARY KEY,
chunk_id VARCHAR(64) UNIQUE NOT NULL,
doc_id VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
heading_path JSONB DEFAULT '[]',
chunk_index INTEGER DEFAULT 0,
embedding vector(768), -- Gemini embedding dimension
created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Vector similarity index (IVFFlat)
CREATE INDEX kb_chunks_gemini_embedding_idx
ON kb_chunks_gemini
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Incremental Update Mechanism:

# Checksum-based incremental indexing

1. Calculate SHA-256 checksum of document content
2. Compare with database checksum
3. If same: skip
4. If different:
- Delete old chunks
- Re-chunk and embed
- Update database

CLI Commands:

# Run Stage 2 only
kb-build --stage build

# Force full rebuild
kb-build --stage build --force-rebuild

# Specify JSONL input
kb-build --stage build --output kb/data/cleaned/custom.jsonl

Performance Metricsโ€‹

MetricCurrent Value
Total Documents39 documents
Total Chunks3,043 chunks
Avg Chunk Size~500 chars
Index Time~2 minutes (56 docs)
Retrieval Latency~300ms

Agent Orchestrationโ€‹

Architecture Overviewโ€‹

The agent orchestration system provides intelligent question routing to specialized handlers:

User Question
โ†“
Agent Router (Question Classifier)
โ†“
โ”œโ”€โ†’ Knowledge Agent (RAG-based Q&A)
โ”‚ โ”œโ”€โ†’ Hybrid Search (semantic + keyword)
โ”‚ โ”œโ”€โ†’ Re-ranking
โ”‚ โ””โ”€โ†’ Generate Answer
โ”‚
โ”œโ”€โ†’ Web Search Agent (Online search)
โ”‚ โ”œโ”€โ†’ Tavily/Brave Search
โ”‚ โ””โ”€โ†’ Summarize Results
โ”‚
โ””โ”€โ†’ Hybrid Agent (Combined)
โ”œโ”€โ†’ Try RAG First
โ”œโ”€โ†’ If insufficient: Web Search
โ””โ”€โ†’ Merge and Answer

Agent Routerโ€‹

File: kb/agents/router.py

Routing Strategy:

  1. Keyword-based heuristics (default):

    • Knowledge: "how to", "explain", "architecture", "implementation"
    • Web Search: "latest", "news", "current", "price", "2025"
    • Hybrid: Fallback for uncertain cases
  2. Optional LLM classification:

    • Enable with use_llm_routing: true
    • Uses Gemini to classify question type
    • Provides more accurate routing

Route Rules:

ROUTE_RULES = {
"knowledge": {
"keywords": [
"how to", "how do", "explain", "what is", "what are",
"architecture", "implementation", "api", "design",
"pattern", "tutorial", "guide", "example",
],
"default_confidence": 0.6,
},
"web_search": {
"keywords": [
"latest", "news", "current", "recent", "today",
"price", "cost", "2025", "2024", "2023",
],
"default_confidence": 0.7,
},
}

Knowledge Agentโ€‹

File: kb/agents/knowledge_agent.py

Responsibilities:

  • RAG-based question answering
  • Hybrid search + re-ranking
  • Citation generation

Workflow:

1. Retrieve chunks using hybrid search
2. Check if results are sufficient (score > 0.4)
3. If insufficient: return with has_sufficient_knowledge=False
4. Generate answer using retrieved context
5. Return enriched with citations

Confidence Scoring:

  • Technical questions: 0.85
  • Generic questions: 0.55
  • Knowledge-specific keywords: +0.15

Web Search Agentโ€‹

File: kb/agents/web_search_agent.py

Responsibilities:

  • Real-time information retrieval
  • Web search via Tavily or Brave
  • Answer synthesis from search results

Workflow:

1. Perform web search
2. Format search results
3. Generate answer from results
4. Return with sources

Confidence Scoring:

  • Real-time keywords: 0.90
  • Current events: 0.75
  • Generic: 0.40

Hybrid Agentโ€‹

File: kb/agents/hybrid_agent.py

Responsibilities:

  • Combine knowledge base and web search
  • Automatic fallback
  • Merge information from both sources

Workflow:

1. Try knowledge base first
2. Check if results are sufficient (score > 0.3)
3. If sufficient: return KB results
4. If insufficient:
- Perform web search
- Merge both results
- Clearly distinguish sources
5. Return combined answer

Confidence Scoring:

  • All questions: 0.65 (safe fallback)

Agent Interfaceโ€‹

All agents implement the Agent base class:

class Agent(ABC):
@abstractmethod
async def handle(self, question: str, context: Dict) -> Dict:
"""Handle question and return response."""
pass

@abstractmethod
def can_handle(self, question: str) -> float:
"""Return confidence score (0.0 - 1.0)."""
pass

Tool Systemโ€‹

Tool Interfaceโ€‹

File: kb/tools/base.py

The tool system provides a pluggable interface for function calling:

class Tool(ABC):
@abstractmethod
def name(self) -> str:
"""Tool name for function calling."""
pass

@abstractmethod
def description(self) -> str:
"""Tool description for the LLM."""
pass

@abstractmethod
def parameters_schema(self) -> Dict[str, Any]:
"""JSON schema for parameters."""
pass

@abstractmethod
async def execute(self, **kwargs) -> str:
"""Execute tool and return result."""
pass

def to_function_declaration(self) -> Dict[str, Any]:
"""Convert to Gemini function declaration format."""
return {
"name": self.name(),
"description": self.description(),
"parameters": self.parameters_schema(),
}

Web Search Toolโ€‹

File: kb/tools/web_search.py

Supported Providers:

  • Tavily Search (primary, recommended)
  • Brave Search (alternative)

Features:

  • Async execution
  • Configurable max_results
  • Search depth options (basic/advanced)
  • LLM-friendly result formatting

Usage Example:

tool = WebSearchTool(
provider="tavily",
api_key=os.getenv("TAVILY_API_KEY"),
max_results=5,
search_depth="basic"
)

result = await tool.execute(
query="latest AI trends 2025",
max_results=5
)

Tool Calling Supportโ€‹

File: kb/llm/gemini.py

Gemini LLM now supports function calling:

async def generate_with_tools(
prompt: str,
tools: List[Any],
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
max_tool_calls: int = 5,
) -> LLMResponse:
"""Generate with tool calling support.

Automatically handles tool call loops and collects results.
"""

Features:

  • Automatic tool execution
  • Multi-step tool calling
  • Conversation history management
  • Error handling with graceful degradation

API Designโ€‹

Endpoint Overviewโ€‹

EndpointMethodDescription
/GETAPI info and available endpoints
/healthGETHealth check
/searchPOSTSemantic search (no LLM)
/askPOSTAgent-orchestrated Q&A

API Detailsโ€‹

1. Root Endpoint /โ€‹

Request:

GET / HTTP/1.1

Response:

{
"name": "KB RAG API",
"version": "1.0.0",
"description": "RAG-based knowledge base with agent orchestration",
"endpoints": {
"health": "/health",
"search": "/search",
"ask": "/ask"
},
"features": [
"agent_orchestration",
"hybrid_search",
"web_search_fallback",
"tool_calling"
]
}

2. Health Check /healthโ€‹

Request:

GET /health HTTP/1.1

Response:

{
"status": "healthy",
"timestamp": "2025-02-05T10:30:00Z",
"components": {
"database": "healthy",
"llm": "healthy",
"web_search": "healthy"
}
}

Request:

POST /search HTTP/1.1
Content-Type: application/json

{
"query": "What is AgentOps?",
"k": 5
}

Response:

[
{
"chunk_id": "82cd0834...",
"doc_id": "docs:ai/prompt-engineering/09-agent-orchestration.mdx",
"content": "Each agent has ONE primary role...",
"heading_path": ["Best Practices Summary", "2. Clear Agent Boundaries"],
"chunk_index": 201,
"score": 0.708,
"document": {
"title": "9 Agent Orchestration",
"path": "docs/ai/prompt-engineering/09-agent-orchestration.mdx"
}
}
]

4. Agent-Orchestrated Q&A /askโ€‹

Request:

POST /ask HTTP/1.1
Content-Type: application/json

{
"question": "What are the agent orchestration patterns?",
"top_k": 5
}

Response (Knowledge Agent):

{
"answer": "Based on the knowledge base, the agent orchestration patterns include the Sequential Pattern and Supervisor + Workers pattern...",
"citations": [
{
"id": 1,
"chunk_id": "42fd61e4...",
"doc_id": "docs:design-patterns",
"title": "3. Design Patterns",
"path": "https://docs.yiw.me/docs/ai/agents/design-patterns",
"heading_path": ["3. Agent Design Patterns", "3.2 Multi-Agent Patterns", "Pattern 8: Sequential Pattern"],
"score": 0.759
}
],
"has_sufficient_knowledge": true,
"model": "gemini-2.5-flash",
"tokens_used": 930,
"retrieval_time_ms": 242,
"generation_time_ms": 657,
"agent_type": "knowledge"
}

Response (Web Search Agent):

{
"answer": "Based on web search results, the latest agent orchestration patterns in 2025 include...",
"citations": [],
"has_sufficient_knowledge": true,
"model": "gemini-2.5-flash",
"tokens_used": 856,
"retrieval_time_ms": 1200,
"generation_time_ms": 543,
"agent_type": "web_search"
}

Response (Hybrid Agent):

{
"answer": "Based on the knowledge base and web search:\n\n**From Knowledge Base:**\nTraditional agent patterns include...\n\n**From Web Search:**\nLatest 2025 approaches add...",
"citations": [
{
"id": 1,
"chunk_id": "abc123...",
"doc_id": "docs:ai/agents",
"title": "Agent Patterns",
"path": "https://docs.yiw.me/docs/ai/agents",
"heading_path": ["Introduction"],
"score": 0.512
}
],
"has_sufficient_knowledge": true,
"model": "gemini-2.5-flash",
"tokens_used": 1456,
"retrieval_time_ms": 1442,
"generation_time_ms": 657,
"agent_type": "hybrid"
}

Error Handlingโ€‹

Error Response Format:

{
"detail": "Ask request failed: Web search API error: 401 - Invalid API key"
}

HTTP Status Codes:

StatusDescription
200Success
400Bad request (invalid parameters)
401Unauthorized (missing/invalid API key)
429Rate limit exceeded
500Internal server error
503Service unavailable (LLM/downstream API error)

Request/Response Schemasโ€‹

AskRequest:

class AskRequest(BaseModel):
question: str = Field(..., min_length=1, max_length=500)
top_k: int = Field(default=10, ge=1, le=20)

AskResponse:

class AskResponse(BaseModel):
answer: str
citations: List[Citation]
has_sufficient_knowledge: bool
model: str
tokens_used: Optional[int]
retrieval_time_ms: int
generation_time_ms: int

Citation:

class Citation(BaseModel):
id: int
chunk_id: str
doc_id: str
title: str
path: str
heading_path: List[str]
score: float

Model Selectionโ€‹

Gemini Model Comparisonโ€‹

ModelStatusUse CaseRecommended
Gemini 2.5 Flashโœ… StableProduction (Default)โญโญโญโญโญ
Gemini 2.5 Proโœ… StableHigh-quality reasoningโญโญโญโญ
Gemini 1.5 Flashโœ… StableBackup optionโญโญโญ
Gemini Flash Latestโœ… AvailableLatest stableโญโญโญโญ

Current Configurationโ€‹

LLM Model:

llm:
model: gemini-2.5-flash # Current default
temperature: 0.3 # Low for factual accuracy
max_tokens: 1024

Embedding Model:

embedding:
model: models/embedding-001 # 768-dimensional vectors

Performance Comparisonโ€‹

ModelLatencyCostQualityStability
Gemini 2.5 Flash~600msLowโญโญโญโญHigh
Gemini 2.5 Pro~1200msMediumโญโญโญโญโญHigh

Configurationโ€‹

Complete Configuration Fileโ€‹

File: kb/config.yaml

# Input/Output paths
docs_dir: docs
output_jsonl: kb/data/cleaned/docs.jsonl

# Chunking configuration
chunking:
max_section_chars: 2000
chunk_size: 500
chunk_overlap: 80

# Embedding configuration
embedding:
provider: gemini
model: models/embedding-001

# Gemini API configuration
gemini:
api_key: ${GEMINI_API_KEY:-}

# Storage configuration
storage:
database_url: ${DATABASE_URL:-postgresql://user:password@localhost:5432/kb}

# Vector store configuration
vector_store:
table_name: kb_chunks_gemini
batch_size: 32

# LLM configuration
llm:
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY:-}
temperature: 0.3
max_tokens: 1024

# RAG configuration
rag:
retrieval:
top_k: 10
score_threshold: 0.6
max_chunks_per_doc: 3
use_hybrid_search: true
use_reranking: true
hybrid_alpha: 0.7

context:
max_length: 4000
include_headings: true

generation:
temperature: 0.3
max_tokens: 1024

# Agent orchestration (NEW)
agent_orchestration:
enabled: true # Enable agent system
fallback_to_web: true # Enable web search fallback
web_fallback_threshold: 0.3 # If max score < 0.3, use web
use_llm_routing: false # Use LLM for routing (default: keyword only)

# Web search configuration (NEW)
web_search:
provider: tavily # tavily or brave
api_key: ${TAVILY_API_KEY:-} # From Doppler
max_results: 5
search_depth: basic # basic or advanced
timeout: 30

# Docusaurus configuration
docusaurus:
site_url: "https://docs.yiw.me"

Environment Variablesโ€‹

Required:

# Database
DATABASE_URL=postgresql://user:password@host:port/dbname

# Gemini API
GEMINI_API_KEY=your-gemini-api-key

# Web Search (one or both)
TAVILY_API_KEY=tvly-your-key
BRAVE_API_KEY=your-brave-key

Managing Secrets with Dopplerโ€‹

# Login to Doppler
doppler login

# Set secrets
doppler secrets set GEMINI_API_KEY "your-key"
doppler secrets set DATABASE_URL "postgresql://..."
doppler secrets set TAVILY_API_KEY "tvly-..."

# Run with Doppler
doppler run -- uv run uvicorn kb.api.app:create_app --reload

Deployment Guideโ€‹

Prerequisitesโ€‹

  • Python 3.11+
  • PostgreSQL 15+ with pgvector
  • Doppler CLI (for secrets management)
  • Node.js 18+ (for frontend)

Installationโ€‹

1. Clone and Setupโ€‹

git clone https://github.com/YiWang24/AiDIY.git
cd AiDIY

2. Install Backend Dependenciesโ€‹

cd kb
pip install -e .

3. Configure Environmentโ€‹

# Using Doppler
doppler login

# Set required secrets
doppler secrets set GEMINI_API_KEY "your-gemini-api-key"
doppler secrets set DATABASE_URL "postgresql://user:password@host:port/dbname"
doppler secrets set TAVILY_API_KEY "tvly-your-key"

4. Initialize Databaseโ€‹

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create database (if needed)
CREATE DATABASE kb_db;

5. Run Data Pipelineโ€‹

# Run complete pipeline
doppler run -- uv run python -m kb.cli --stage all

# Force rebuild
doppler run -- uv run python -m kb.cli --stage all --force-rebuild

6. Start API Serverโ€‹

# Development
doppler run -- uv run uvicorn kb.api.app:create_app \
--host 0.0.0.0 \
--port 8000 \
--reload

# Production
doppler run -- gunicorn kb.api.app:create_app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 120

7. Frontend Setupโ€‹

cd ..
npm install
npm start
# Visit http://localhost:3001

Docker Deploymentโ€‹

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY kb/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy code
COPY kb/ kb/

# Expose port
EXPOSE 8000

# Start service
CMD ["uvicorn", "kb.api.app:create_app", "--host", "0.0.0.0", "--port", "8000"]

docker-compose.yml:

version: '3.8'

services:
kb-api:
build: ./kb
environment:
- GEMINI_API_KEY=${GEMINI_API_KEY}
- DATABASE_URL=${DATABASE_URL}
- TAVILY_API_KEY=${TAVILY_API_KEY}
ports:
- "8000:8000"
restart: always
depends_on:
- postgres

postgres:
image: pgvector/pgvector:pg16
environment:
- POSTGRES_DB=kb_db
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
restart: always

volumes:
postgres_data:

Troubleshootingโ€‹

Common Issuesโ€‹

1. "No such table: kb_chunks_gemini"โ€‹

Cause: Tables not created

Solution:

# Ensure pgvector extension
psql -d kb_db -c "CREATE EXTENSION IF NOT EXISTS vector;"

# Re-run pipeline
doppler run -- uv run python -m kb.cli --stage all

2. "Web search failed: API key required"โ€‹

Cause: Missing web search API key

Solution:

# Check if API key is set
doppler secrets get TAVILY_API_KEY

# Set the key
doppler secrets set TAVILY_API_KEY "tvly-your-key"

3. "Empty retrieval results"โ€‹

Cause: score_threshold too high

Solution:

# Lower threshold in config.yaml
rag:
retrieval:
score_threshold: 0.4 # Lower from 0.6

4. Agent routing not workingโ€‹

Cause: Agent orchestration disabled

Solution:

# Enable in config.yaml
rag:
agent_orchestration:
enabled: true

Debug Modeโ€‹

# Enable debug logging
LOG_LEVEL=DEBUG doppler run -- uv run uvicorn kb.api.app:create_app --reload

Health Checksโ€‹

# Check API health
curl http://localhost:8000/health

# Test search endpoint
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "test", "k": 5}'

# Test agent endpoint
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is RAG?", "top_k": 5}'

Performance Optimizationโ€‹

Database Optimizationโ€‹

-- Create vector index (if not exists)
CREATE INDEX IF NOT EXISTS kb_chunks_gemini_embedding_idx
ON kb_chunks_gemini
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Create document ID index
CREATE INDEX IF NOT EXISTS kb_chunks_gemini_doc_id_idx
ON kb_chunks_gemini(doc_id);

Connection Poolingโ€‹

# Configure pool size
ConnectionPool(
conninfo=database_url,
min_size=1,
max_size=10, # Adjust based on concurrency
open=False
)

Batch Processingโ€‹

# Optimize batch sizes
vector_store:
batch_size: 32 # Embedding batch size

web_search:
max_results: 5 # Limit search results

Testingโ€‹

Unit Testsโ€‹

cd kb
pytest tests/

Integration Testsโ€‹

# Test with real database
pytest tests/integration/ --integration

Manual Testingโ€‹

# Test knowledge agent
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I implement RAG architecture?", "top_k": 5}'

# Test web search agent
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What are the latest AI trends in 2025?", "top_k": 5}'

# Test hybrid agent
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is the current price of GPT-4 API?", "top_k": 5}'

Roadmapโ€‹

Phase 1: Completed โœ…โ€‹

  • Basic RAG system
  • Document indexing pipeline
  • Semantic search
  • AI Q&A
  • Frontend integration
  • Agent orchestration system
  • Web search integration
  • Tool calling support
  • Hybrid search + re-ranking

Phase 2: In Progress ๐Ÿšงโ€‹

  • Streaming responses (SSE)
  • Multi-turn conversation memory
  • User feedback mechanism
  • Analytics dashboard

Phase 3: Planned ๐Ÿ“‹โ€‹

  • Multi-modal support (images, charts)
  • Pluggable LLM engines
  • A/B testing framework
  • Advanced filtering strategies

Phase 4: Future ๐Ÿ”ฎโ€‹

  • Custom tool plugins
  • Adaptive retrieval strategies
  • Knowledge graph enhancement
  • Multi-language support

Referencesโ€‹


Changelogโ€‹

DateVersionChanges
2025-02-05v2.0.0Agent orchestration system, web search integration, tool calling, hybrid search
2025-01-XXv1.0.0Initial RAG system

Document Maintenance: Regularly updated to reflect architecture changes

Feedback: GitHub Issues