Feat: RAG tool in Corpus App
Issue: Implementation of RAG in Corpus App using Qdrant
Overview
Implement a "Chat with your Documents" feature that allows users to perform semantic search and contextual conversation over uploaded PDF and text records. The system will leverage Qdrant as the vector database for high-performance retrieval.
Goals
- Automated Ingestion: Background tasks to extract text, chunk, and embed documents upon upload or manual sync.
- Semantic Retrieval: Fast and accurate search using vector embeddings and reranking.
- Privacy-First Generation: Chunks are retrieved by the server, but the actual LLM call is performed client-side using user-provided API keys (stored locally).
Technical Architecture
1. Infrastructure (Docker)
-
Qdrant: Add a self-hosted Qdrant service to
docker-compose.yml. -
Worker: Ensure Celery workers are configured with an
embeddingqueue.
2. Backend (FastAPI + Celery)
- Vector Database: Use Qdrant instead of a relational DB for embeddings.
-
Embedding Model:
sentence-transformers/all-MiniLM-L6-v2(running locally on the worker). -
Text Processing:
-
PDF Extraction:
PyMuPDF(fitz). -
Chunking:
tiktokenorlangchainrecursive character splitter.
-
PDF Extraction:
-
API Endpoints:
-
POST /rag/sync: Trigger embedding for a specificrecord_id. -
POST /rag/retrieve: Perform vector search in Qdrant + Reranking. -
GET /rag/status/{id}: Monitor processing progress.
-
3. Frontend (React)
-
Chat Interface: A dedicated page (
/rag) for conversational search. - Record Selection: Allow users to scope their chat to specific uploaded files.
- Client-Side LLM: Frontend logic to construct context-augmented prompts and call OpenAI/Gemini directly using local API keys.
Task Breakdown
Phase 1: Infrastructure & Infrastructure Setup
-
Add Qdrant to docker-compose.yml. -
Install dependencies: qdrant-client,sentence-transformers,pymupdf,tiktoken.
Phase 2: Ingestion Pipeline
-
Create Celery task embed_recordinapp/tasks/embedding.py. -
Implement text extraction for PDF/TXT. -
Implement chunking strategy (e.g., 500 tokens with 50 overlap). -
Implement embedding and Qdrant storage logic.
Phase 3: API & Retrieval
-
Implement /sync,/retrieve, and/statusendpoints inapp/api/v1/endpoints/rag.py. -
Implement reranking logic
Phase 4: Frontend Implementation
-
Build RagChatPage.tsxcomponent. -
Implement local API key management. -
Implement chat-to-retrieval-to-LLM flow.
Acceptance Criteria
-
Users can trigger embedding for any uploaded document. -
Retrieval returns top-K relevant chunks with high semantic similarity. -
The chat interface successfully generates answers based only on the provided context. -
All vectors are properly associated with user records to ensure data isolation.
Ingestion workflow:
Retrieval Workflow:
Generation workflow:
Edited by Praneeth Ashish


