KB RAG System - Complete Architecture Documentation
Project Overviewโ
A comprehensive RAG (Retrieval-Augmented Generation) knowledge base system with intelligent agent orchestration, combining semantic search, web search capabilities, and AI-powered question answering.
Core Features:
- ๐ Document Indexing: Automatic conversion of MDX documents to vector embeddings
- ๐ Hybrid Search: Semantic + keyword search with re-ranking
- ๐ค Agent Orchestration: Intelligent routing to specialized agents (Knowledge, Web Search, Hybrid)
- ๐ Web Search Fallback: Automatic web search when knowledge base is insufficient
- ๐ ๏ธ Tool Calling: Native support for Gemini function calling API
- ๐ Citation Tracking: Every answer includes source document references
- โก Real-time Response: Optimized retrieval and generation pipeline
Table of Contentsโ
- Technology Stack
- System Architecture
- Data Pipeline
- Agent Orchestration
- Tool System
- API Design
- Model Selection
- Deployment Guide
- Configuration
- Troubleshooting
Technology Stackโ
Backend (Python)โ
Core Framework:
- FastAPI 0.110+: High-performance web framework
- Pydantic 2.6+: Data validation and serialization
- Uvicorn: ASGI server
Data Layer:
- PostgreSQL 15+ with pgvector: Vector database
- psycopg 3: Database driver
- psycopg-pool: Connection pool management
AI/ML:
- Gemini API: Google Gemini 2.5 Flash for LLM
- Gemini Embeddings: models/embedding-001
- Tavily/Brave Search: Web search integration
- LangChain: Text processing and chunking
Data Processing:
- PyYAML: Configuration management
- httpx: HTTP client for external APIs
- tqdm: Progress bars
Testing:
- pytest 8.0+: Unit testing
- pytest-asyncio: Async test support
Frontend (TypeScript/Next.js)โ
Framework:
- Docusaurus: Documentation site generator
- React 18+: UI framework
- TypeScript: Type safety
Build Tools:
- npm: Package management
- webpack: Module bundling
Infrastructureโ
Database:
- PostgreSQL 15+
- pgvector extension
- Python 3.11+
Environment Management:
- Doppler: Environment variables and secrets
- uv: Python package management
System Architectureโ
Overall Architectureโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Frontend (Docusaurus) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ AI Chat Widget (React) โ โ
โ โ - User input โ โ
โ โ - Display AI responses and citations โ โ
โ โ - SSE streaming support โ โ
โ โ - http://localhost:3001 โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ API Layer (FastAPI) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ POST /ask Agent-orchestrated Q&A โ โ
โ โ POST /search Semantic search โ โ
โ โ GET /health Health check โ โ
โ โ http://localhost:8000 โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Agent Orchestration Layer โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Agent Router (Question Classifier) โ โ
โ โ - Keyword-based routing โ โ
โ โ - Optional LLM classification โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Knowledge โ โ Web Search โ โ Hybrid โ โ
โ โ Agent โ โ Agent โ โ Agent โ โ
โ โ (RAG-based) โ โ (Tavily/API) โ โ (Combined) โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Business Logic Layer โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RAG Pipeline โ โ Retriever โ โ
โ โ - Context build โ โ - Hybrid search โ โ
โ โ - Answer gen โ โ - Re-ranking โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Tool System โ โ
โ โ - Web search (Tavily/Brave) โ โ
โ โ - Function calling interface โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Access Layer โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ VectorStore โ โ DocStore โ โ
โ โ - pgvector โ โ - Document metadata โ โ
โ โ - Similarity โ โ - Checksum tracking โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Storage Layer โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PostgreSQL + pgvector โ โ
โ โ - kb_documents: Document metadata โ โ
โ โ - kb_chunks_gemini: Vector embeddings (768-dim) โ โ
โ โ - kb_index_meta: Index signatures โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Pipeline (Offline Indexing) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Stage 1: Document Cleaning โ โ
โ โ - MDX โ JSONL (JavaScript tools) โ โ
โ โ - Remove runtime code โ โ
โ โ - Generate checksums โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Stage 2: Vector Indexing โ โ
โ โ - Text chunking โ โ
โ โ - Embedding generation (Gemini) โ โ
โ โ - Database storage โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Request Flow Diagramโ
User Question
โ
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Agent Router โ
โ - Classify Q โ
โ - Select Agent โ
โโโโโโโโโโฌโโโโโโโโโโ
โ
โโโโโโดโโโโโฌโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
โKnow. โ โ Web โ โ Hybrid โ
โAgent โ โ Agent โ โ Agent โ
โโโโโฌโโโโ โโโโโโฌโโโโโ โโโโโโฌโโโโโ
โ โ โ
โ โโโโโโโผโโโโโโ โ
โ โ Tavily/ โ โ
โ โ Brave โ โ
โ โโโโโโโโโโโโโ โ
โ โ
โโโโโโโโฌโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโ
โ Retriever โ
โ - Hybrid โ
โ - Re-rank โ
โโโโโโโโฌโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโ
โContext Builderโ
โโโโโโโโฌโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโ
โ LLM Call โ
โ (Gemini) โ
โโโโโโโโฌโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโ
โ Response โ
โ + Citations โ
โโโโโโโโโโโโโโโโ
โ
โผ
Return to User
Data Pipelineโ
Pipeline Stagesโ
Stage 1: Document Cleaningโ
Tool: JavaScript + mdx-clean
Input: MDX source files
docs/
โโโ cs/
โ โโโ algorithms/
โ โโโ ...
โโโ ai/
โ โโโ agents/
โ โโโ ...
โโโ ...
Processing Steps:
- Read MDX files: Parse frontmatter and content
- Remove runtime code:
- Remove import/export statements
- Remove JSX syntax
- Preserve markdown content
- Transform special syntax:
- TabItems โ headings
- Preserve code blocks
- Preserve Mermaid diagrams
- Generate metadata:
- Document ID
- Title
- Path
- SHA-256 checksum (for incremental updates)
- Output JSONL:
kb/data/cleaned/docs.jsonl
Output Format:
{
"id": "ai/agentops",
"path": "docs/ai/agents/agentops/index.mdx",
"title": "AgentOps and Security",
"checksum": "6c5bb14e0a5801d7fb4fb4431ef3e58e8c8cf6b19bab56970589111a4007625b",
"content": "# AgentOps and Security\n\nAgentOps combines...",
"frontmatter": {
"title": "AgentOps and Security",
"tags": ["agents", "security"]
}
}
CLI Commands:
# Run Stage 1 only
kb-build --stage clean
# Specify input/output
kb-build --stage clean --docs-dir ./docs --output kb/data/cleaned/custom.jsonl
Stage 2: Vector Indexingโ
Tool: Python + Gemini Embeddings
Input: kb/data/cleaned/docs.jsonl
Processing Flow:
1. Text Chunking
# Strategy: MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter
Configuration:
- max_section_chars: 2000 # Max chars before recursive split
- chunk_size: 500 # Target chunk size
- chunk_overlap: 80 # Overlap between chunks
Preserve:
- Heading hierarchy (H1, H2, H3...)
- Section structure
- Paragraph content
2. Embedding Generation
# Using Gemini Embeddings API
Model: models/embedding-001
API: https://generativelanguage.googleapis.com/v1beta/
Batch requests:
- batch_size: 32 chunks/request
- Auto-retry mechanism
- Progress bar display
Output: 768-dimensional vectors
3. Database Storage
-- Document metadata table
CREATE TABLE kb_documents (
doc_id VARCHAR(255) PRIMARY KEY,
path VARCHAR(1024) NOT NULL,
title VARCHAR(512) NOT NULL,
version VARCHAR(64) DEFAULT 'latest',
checksum VARCHAR(64) NOT NULL,
chunk_ids JSONB DEFAULT '[]',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Vector embeddings table
CREATE TABLE kb_chunks_gemini (
id SERIAL PRIMARY KEY,
chunk_id VARCHAR(64) UNIQUE NOT NULL,
doc_id VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
heading_path JSONB DEFAULT '[]',
chunk_index INTEGER DEFAULT 0,
embedding vector(768), -- Gemini embedding dimension
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Vector similarity index (IVFFlat)
CREATE INDEX kb_chunks_gemini_embedding_idx
ON kb_chunks_gemini
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Incremental Update Mechanism:
# Checksum-based incremental indexing
1. Calculate SHA-256 checksum of document content
2. Compare with database checksum
3. If same: skip
4. If different:
- Delete old chunks
- Re-chunk and embed
- Update database
CLI Commands:
# Run Stage 2 only
kb-build --stage build
# Force full rebuild
kb-build --stage build --force-rebuild
# Specify JSONL input
kb-build --stage build --output kb/data/cleaned/custom.jsonl
Performance Metricsโ
| Metric | Current Value |
|---|---|
| Total Documents | 39 documents |
| Total Chunks | 3,043 chunks |
| Avg Chunk Size | ~500 chars |
| Index Time | ~2 minutes (56 docs) |
| Retrieval Latency | ~300ms |
Agent Orchestrationโ
Architecture Overviewโ
The agent orchestration system provides intelligent question routing to specialized handlers:
User Question
โ
Agent Router (Question Classifier)
โ
โโโ Knowledge Agent (RAG-based Q&A)
โ โโโ Hybrid Search (semantic + keyword)
โ โโโ Re-ranking
โ โโโ Generate Answer
โ
โโโ Web Search Agent (Online search)
โ โโโ Tavily/Brave Search
โ โโโ Summarize Results
โ
โโโ Hybrid Agent (Combined)
โโโ Try RAG First
โโโ If insufficient: Web Search
โโโ Merge and Answer
Agent Routerโ
File: kb/agents/router.py
Routing Strategy:
-
Keyword-based heuristics (default):
- Knowledge: "how to", "explain", "architecture", "implementation"
- Web Search: "latest", "news", "current", "price", "2025"
- Hybrid: Fallback for uncertain cases
-
Optional LLM classification:
- Enable with
use_llm_routing: true - Uses Gemini to classify question type
- Provides more accurate routing
- Enable with
Route Rules:
ROUTE_RULES = {
"knowledge": {
"keywords": [
"how to", "how do", "explain", "what is", "what are",
"architecture", "implementation", "api", "design",
"pattern", "tutorial", "guide", "example",
],
"default_confidence": 0.6,
},
"web_search": {
"keywords": [
"latest", "news", "current", "recent", "today",
"price", "cost", "2025", "2024", "2023",
],
"default_confidence": 0.7,
},
}
Knowledge Agentโ
File: kb/agents/knowledge_agent.py
Responsibilities:
- RAG-based question answering
- Hybrid search + re-ranking
- Citation generation
Workflow:
1. Retrieve chunks using hybrid search
2. Check if results are sufficient (score > 0.4)
3. If insufficient: return with has_sufficient_knowledge=False
4. Generate answer using retrieved context
5. Return enriched with citations
Confidence Scoring:
- Technical questions: 0.85
- Generic questions: 0.55
- Knowledge-specific keywords: +0.15
Web Search Agentโ
File: kb/agents/web_search_agent.py
Responsibilities:
- Real-time information retrieval
- Web search via Tavily or Brave
- Answer synthesis from search results
Workflow:
1. Perform web search
2. Format search results
3. Generate answer from results
4. Return with sources
Confidence Scoring:
- Real-time keywords: 0.90
- Current events: 0.75
- Generic: 0.40
Hybrid Agentโ
File: kb/agents/hybrid_agent.py
Responsibilities:
- Combine knowledge base and web search
- Automatic fallback
- Merge information from both sources
Workflow:
1. Try knowledge base first
2. Check if results are sufficient (score > 0.3)
3. If sufficient: return KB results
4. If insufficient:
- Perform web search
- Merge both results
- Clearly distinguish sources
5. Return combined answer
Confidence Scoring:
- All questions: 0.65 (safe fallback)
Agent Interfaceโ
All agents implement the Agent base class:
class Agent(ABC):
@abstractmethod
async def handle(self, question: str, context: Dict) -> Dict:
"""Handle question and return response."""
pass
@abstractmethod
def can_handle(self, question: str) -> float:
"""Return confidence score (0.0 - 1.0)."""
pass
Tool Systemโ
Tool Interfaceโ
File: kb/tools/base.py
The tool system provides a pluggable interface for function calling:
class Tool(ABC):
@abstractmethod
def name(self) -> str:
"""Tool name for function calling."""
pass
@abstractmethod
def description(self) -> str:
"""Tool description for the LLM."""
pass
@abstractmethod
def parameters_schema(self) -> Dict[str, Any]:
"""JSON schema for parameters."""
pass
@abstractmethod
async def execute(self, **kwargs) -> str:
"""Execute tool and return result."""
pass
def to_function_declaration(self) -> Dict[str, Any]:
"""Convert to Gemini function declaration format."""
return {
"name": self.name(),
"description": self.description(),
"parameters": self.parameters_schema(),
}
Web Search Toolโ
File: kb/tools/web_search.py
Supported Providers:
- Tavily Search (primary, recommended)
- Brave Search (alternative)
Features:
- Async execution
- Configurable max_results
- Search depth options (basic/advanced)
- LLM-friendly result formatting
Usage Example:
tool = WebSearchTool(
provider="tavily",
api_key=os.getenv("TAVILY_API_KEY"),
max_results=5,
search_depth="basic"
)
result = await tool.execute(
query="latest AI trends 2025",
max_results=5
)
Tool Calling Supportโ
File: kb/llm/gemini.py
Gemini LLM now supports function calling:
async def generate_with_tools(
prompt: str,
tools: List[Any],
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
max_tool_calls: int = 5,
) -> LLMResponse:
"""Generate with tool calling support.
Automatically handles tool call loops and collects results.
"""
Features:
- Automatic tool execution
- Multi-step tool calling
- Conversation history management
- Error handling with graceful degradation
API Designโ
Endpoint Overviewโ
| Endpoint | Method | Description |
|---|---|---|
/ | GET | API info and available endpoints |
/health | GET | Health check |
/search | POST | Semantic search (no LLM) |
/ask | POST | Agent-orchestrated Q&A |
API Detailsโ
1. Root Endpoint /โ
Request:
GET / HTTP/1.1
Response:
{
"name": "KB RAG API",
"version": "1.0.0",
"description": "RAG-based knowledge base with agent orchestration",
"endpoints": {
"health": "/health",
"search": "/search",
"ask": "/ask"
},
"features": [
"agent_orchestration",
"hybrid_search",
"web_search_fallback",
"tool_calling"
]
}
2. Health Check /healthโ
Request:
GET /health HTTP/1.1
Response:
{
"status": "healthy",
"timestamp": "2025-02-05T10:30:00Z",
"components": {
"database": "healthy",
"llm": "healthy",
"web_search": "healthy"
}
}
3. Semantic Search /searchโ
Request:
POST /search HTTP/1.1
Content-Type: application/json
{
"query": "What is AgentOps?",
"k": 5
}
Response:
[
{
"chunk_id": "82cd0834...",
"doc_id": "docs:ai/prompt-engineering/09-agent-orchestration.mdx",
"content": "Each agent has ONE primary role...",
"heading_path": ["Best Practices Summary", "2. Clear Agent Boundaries"],
"chunk_index": 201,
"score": 0.708,
"document": {
"title": "9 Agent Orchestration",
"path": "docs/ai/prompt-engineering/09-agent-orchestration.mdx"
}
}
]
4. Agent-Orchestrated Q&A /askโ
Request:
POST /ask HTTP/1.1
Content-Type: application/json
{
"question": "What are the agent orchestration patterns?",
"top_k": 5
}
Response (Knowledge Agent):
{
"answer": "Based on the knowledge base, the agent orchestration patterns include the Sequential Pattern and Supervisor + Workers pattern...",
"citations": [
{
"id": 1,
"chunk_id": "42fd61e4...",
"doc_id": "docs:design-patterns",
"title": "3. Design Patterns",
"path": "https://docs.yiw.me/docs/ai/agents/design-patterns",
"heading_path": ["3. Agent Design Patterns", "3.2 Multi-Agent Patterns", "Pattern 8: Sequential Pattern"],
"score": 0.759
}
],
"has_sufficient_knowledge": true,
"model": "gemini-2.5-flash",
"tokens_used": 930,
"retrieval_time_ms": 242,
"generation_time_ms": 657,
"agent_type": "knowledge"
}
Response (Web Search Agent):
{
"answer": "Based on web search results, the latest agent orchestration patterns in 2025 include...",
"citations": [],
"has_sufficient_knowledge": true,
"model": "gemini-2.5-flash",
"tokens_used": 856,
"retrieval_time_ms": 1200,
"generation_time_ms": 543,
"agent_type": "web_search"
}
Response (Hybrid Agent):
{
"answer": "Based on the knowledge base and web search:\n\n**From Knowledge Base:**\nTraditional agent patterns include...\n\n**From Web Search:**\nLatest 2025 approaches add...",
"citations": [
{
"id": 1,
"chunk_id": "abc123...",
"doc_id": "docs:ai/agents",
"title": "Agent Patterns",
"path": "https://docs.yiw.me/docs/ai/agents",
"heading_path": ["Introduction"],
"score": 0.512
}
],
"has_sufficient_knowledge": true,
"model": "gemini-2.5-flash",
"tokens_used": 1456,
"retrieval_time_ms": 1442,
"generation_time_ms": 657,
"agent_type": "hybrid"
}
Error Handlingโ
Error Response Format:
{
"detail": "Ask request failed: Web search API error: 401 - Invalid API key"
}
HTTP Status Codes:
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad request (invalid parameters) |
| 401 | Unauthorized (missing/invalid API key) |
| 429 | Rate limit exceeded |
| 500 | Internal server error |
| 503 | Service unavailable (LLM/downstream API error) |
Request/Response Schemasโ
AskRequest:
class AskRequest(BaseModel):
question: str = Field(..., min_length=1, max_length=500)
top_k: int = Field(default=10, ge=1, le=20)
AskResponse:
class AskResponse(BaseModel):
answer: str
citations: List[Citation]
has_sufficient_knowledge: bool
model: str
tokens_used: Optional[int]
retrieval_time_ms: int
generation_time_ms: int
Citation:
class Citation(BaseModel):
id: int
chunk_id: str
doc_id: str
title: str
path: str
heading_path: List[str]
score: float
Model Selectionโ
Gemini Model Comparisonโ
| Model | Status | Use Case | Recommended |
|---|---|---|---|
| Gemini 2.5 Flash | โ Stable | Production (Default) | โญโญโญโญโญ |
| Gemini 2.5 Pro | โ Stable | High-quality reasoning | โญโญโญโญ |
| Gemini 1.5 Flash | โ Stable | Backup option | โญโญโญ |
| Gemini Flash Latest | โ Available | Latest stable | โญโญโญโญ |
Current Configurationโ
LLM Model:
llm:
model: gemini-2.5-flash # Current default
temperature: 0.3 # Low for factual accuracy
max_tokens: 1024
Embedding Model:
embedding:
model: models/embedding-001 # 768-dimensional vectors
Performance Comparisonโ
| Model | Latency | Cost | Quality | Stability |
|---|---|---|---|---|
| Gemini 2.5 Flash | ~600ms | Low | โญโญโญโญ | High |
| Gemini 2.5 Pro | ~1200ms | Medium | โญโญโญโญโญ | High |
Configurationโ
Complete Configuration Fileโ
File: kb/config.yaml
# Input/Output paths
docs_dir: docs
output_jsonl: kb/data/cleaned/docs.jsonl
# Chunking configuration
chunking:
max_section_chars: 2000
chunk_size: 500
chunk_overlap: 80
# Embedding configuration
embedding:
provider: gemini
model: models/embedding-001
# Gemini API configuration
gemini:
api_key: ${GEMINI_API_KEY:-}
# Storage configuration
storage:
database_url: ${DATABASE_URL:-postgresql://user:password@localhost:5432/kb}
# Vector store configuration
vector_store:
table_name: kb_chunks_gemini
batch_size: 32
# LLM configuration
llm:
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY:-}
temperature: 0.3
max_tokens: 1024
# RAG configuration
rag:
retrieval:
top_k: 10
score_threshold: 0.6
max_chunks_per_doc: 3
use_hybrid_search: true
use_reranking: true
hybrid_alpha: 0.7
context:
max_length: 4000
include_headings: true
generation:
temperature: 0.3
max_tokens: 1024
# Agent orchestration (NEW)
agent_orchestration:
enabled: true # Enable agent system
fallback_to_web: true # Enable web search fallback
web_fallback_threshold: 0.3 # If max score < 0.3, use web
use_llm_routing: false # Use LLM for routing (default: keyword only)
# Web search configuration (NEW)
web_search:
provider: tavily # tavily or brave
api_key: ${TAVILY_API_KEY:-} # From Doppler
max_results: 5
search_depth: basic # basic or advanced
timeout: 30
# Docusaurus configuration
docusaurus:
site_url: "https://docs.yiw.me"
Environment Variablesโ
Required:
# Database
DATABASE_URL=postgresql://user:password@host:port/dbname
# Gemini API
GEMINI_API_KEY=your-gemini-api-key
# Web Search (one or both)
TAVILY_API_KEY=tvly-your-key
BRAVE_API_KEY=your-brave-key
Managing Secrets with Dopplerโ
# Login to Doppler
doppler login
# Set secrets
doppler secrets set GEMINI_API_KEY "your-key"
doppler secrets set DATABASE_URL "postgresql://..."
doppler secrets set TAVILY_API_KEY "tvly-..."
# Run with Doppler
doppler run -- uv run uvicorn kb.api.app:create_app --reload
Deployment Guideโ
Prerequisitesโ
- Python 3.11+
- PostgreSQL 15+ with pgvector
- Doppler CLI (for secrets management)
- Node.js 18+ (for frontend)
Installationโ
1. Clone and Setupโ
git clone https://github.com/YiWang24/AiDIY.git
cd AiDIY
2. Install Backend Dependenciesโ
cd kb
pip install -e .
3. Configure Environmentโ
# Using Doppler
doppler login
# Set required secrets
doppler secrets set GEMINI_API_KEY "your-gemini-api-key"
doppler secrets set DATABASE_URL "postgresql://user:password@host:port/dbname"
doppler secrets set TAVILY_API_KEY "tvly-your-key"
4. Initialize Databaseโ
-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create database (if needed)
CREATE DATABASE kb_db;
5. Run Data Pipelineโ
# Run complete pipeline
doppler run -- uv run python -m kb.cli --stage all
# Force rebuild
doppler run -- uv run python -m kb.cli --stage all --force-rebuild
6. Start API Serverโ
# Development
doppler run -- uv run uvicorn kb.api.app:create_app \
--host 0.0.0.0 \
--port 8000 \
--reload
# Production
doppler run -- gunicorn kb.api.app:create_app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 120
7. Frontend Setupโ
cd ..
npm install
npm start
# Visit http://localhost:3001
Docker Deploymentโ
Dockerfile:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY kb/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy code
COPY kb/ kb/
# Expose port
EXPOSE 8000
# Start service
CMD ["uvicorn", "kb.api.app:create_app", "--host", "0.0.0.0", "--port", "8000"]
docker-compose.yml:
version: '3.8'
services:
kb-api:
build: ./kb
environment:
- GEMINI_API_KEY=${GEMINI_API_KEY}
- DATABASE_URL=${DATABASE_URL}
- TAVILY_API_KEY=${TAVILY_API_KEY}
ports:
- "8000:8000"
restart: always
depends_on:
- postgres
postgres:
image: pgvector/pgvector:pg16
environment:
- POSTGRES_DB=kb_db
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
restart: always
volumes:
postgres_data:
Troubleshootingโ
Common Issuesโ
1. "No such table: kb_chunks_gemini"โ
Cause: Tables not created
Solution:
# Ensure pgvector extension
psql -d kb_db -c "CREATE EXTENSION IF NOT EXISTS vector;"
# Re-run pipeline
doppler run -- uv run python -m kb.cli --stage all
2. "Web search failed: API key required"โ
Cause: Missing web search API key
Solution:
# Check if API key is set
doppler secrets get TAVILY_API_KEY
# Set the key
doppler secrets set TAVILY_API_KEY "tvly-your-key"
3. "Empty retrieval results"โ
Cause: score_threshold too high
Solution:
# Lower threshold in config.yaml
rag:
retrieval:
score_threshold: 0.4 # Lower from 0.6
4. Agent routing not workingโ
Cause: Agent orchestration disabled
Solution:
# Enable in config.yaml
rag:
agent_orchestration:
enabled: true
Debug Modeโ
# Enable debug logging
LOG_LEVEL=DEBUG doppler run -- uv run uvicorn kb.api.app:create_app --reload
Health Checksโ
# Check API health
curl http://localhost:8000/health
# Test search endpoint
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "test", "k": 5}'
# Test agent endpoint
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is RAG?", "top_k": 5}'
Performance Optimizationโ
Database Optimizationโ
-- Create vector index (if not exists)
CREATE INDEX IF NOT EXISTS kb_chunks_gemini_embedding_idx
ON kb_chunks_gemini
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Create document ID index
CREATE INDEX IF NOT EXISTS kb_chunks_gemini_doc_id_idx
ON kb_chunks_gemini(doc_id);
Connection Poolingโ
# Configure pool size
ConnectionPool(
conninfo=database_url,
min_size=1,
max_size=10, # Adjust based on concurrency
open=False
)
Batch Processingโ
# Optimize batch sizes
vector_store:
batch_size: 32 # Embedding batch size
web_search:
max_results: 5 # Limit search results
Testingโ
Unit Testsโ
cd kb
pytest tests/
Integration Testsโ
# Test with real database
pytest tests/integration/ --integration
Manual Testingโ
# Test knowledge agent
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I implement RAG architecture?", "top_k": 5}'
# Test web search agent
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What are the latest AI trends in 2025?", "top_k": 5}'
# Test hybrid agent
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is the current price of GPT-4 API?", "top_k": 5}'
Roadmapโ
Phase 1: Completed โ โ
- Basic RAG system
- Document indexing pipeline
- Semantic search
- AI Q&A
- Frontend integration
- Agent orchestration system
- Web search integration
- Tool calling support
- Hybrid search + re-ranking
Phase 2: In Progress ๐งโ
- Streaming responses (SSE)
- Multi-turn conversation memory
- User feedback mechanism
- Analytics dashboard
Phase 3: Planned ๐โ
- Multi-modal support (images, charts)
- Pluggable LLM engines
- A/B testing framework
- Advanced filtering strategies
Phase 4: Future ๐ฎโ
- Custom tool plugins
- Adaptive retrieval strategies
- Knowledge graph enhancement
- Multi-language support
Referencesโ
- Gemini API Documentation
- pgvector Documentation
- LangChain Documentation
- FastAPI Documentation
- Tavily Search API
- Brave Search API
Changelogโ
| Date | Version | Changes |
|---|---|---|
| 2025-02-05 | v2.0.0 | Agent orchestration system, web search integration, tool calling, hybrid search |
| 2025-01-XX | v1.0.0 | Initial RAG system |
Document Maintenance: Regularly updated to reflect architecture changes
Feedback: GitHub Issues