Feat/rag: RAG Implementation and Hybrid Retrieval (FTS + pg_trgm + context_window)
Feature Request Template
Title (semantic)
Feat/rag: RAG Implementation and Hybrid Retrieval (FTS + pg_trgm + context_window)
Description
This merge request implements a Retrieval-Augmented Generation (RAG) system for the Corpus Server App with hybrid semantic retrieval capabilities. The implementation enables users to query records using natural language and receive relevant text segments with context.
Key Features Implemented
-
Hybrid Retrieval Service: Combines PostgreSQL full-text search (
tsvector) with trigram similarity for robust multilingual text matching (Telugu + English) - Knowledge Graph Validation: MetadataIndexer service validates and optimizes knowledge graphs for storage in the extracted text table
-
RAG API Endpoints: Four endpoints for semantic querying:
-
GET /{record_id}/knowledge-status- Check indexing status for a record -
PUT /{record_id}/knowledge- Save/update knowledge graph and metadata -
GET /{record_id}/knowledge-map- Retrieve knowledge graph with usage stats -
POST /{record_id}/retrievals- Search within a record using hybrid retrieval
-
- Text Normalization Utils: TextNormalizer class handles extraction of text from various formats (dict/model, segments, transcription fields)
-
Record Model Extensions: Added
semantic_metadata,knowledge_graph, andindexed_versionfields to support semantic indexing -
RAG Status Tracking: RAGStatus enum (
no_text,ready_for_indexing,indexed,index_failed) with version-aware caching
Adaptive Query Parameters
Query parameters (top_k, context_window, trigram_threshold) are dynamically computed based on query characteristics:
- Short queries (≤2 tokens): Larger
top_k(12) andcontext_window(2) - Queries with keywords like "explain", "describe": Increased
context_windowfor detailed results - Queries with "approx", "similar", "like": Lowered
trigram_thresholdfor fuzzy matching
Fallback Strategy
Multi-tier retrieval with graceful degradation:
- Hybrid query with OR tsquery
- FTS-only fallback
- Token-wise similarity fallback
Checklist
-
The feature has been fully implemented. -
Tests for the new feature are included and passing. -
User documentation/guides have been updated (if applicable). -
Impact on existing functionality has been considered.
Related Issue(s)
Closes #
Edited by Mukthanand Reddy M