Feat: RAG tool in Corpus App

Issue: Implementation of RAG in Corpus App using Qdrant

Overview

Implement a "Chat with your Documents" feature that allows users to perform semantic search and contextual conversation over uploaded PDF and text records. The system will leverage Qdrant as the vector database for high-performance retrieval.

Goals

Automated Ingestion: Background tasks to extract text, chunk, and embed documents upon upload or manual sync.
Semantic Retrieval: Fast and accurate search using vector embeddings and reranking.
Privacy-First Generation: Chunks are retrieved by the server, but the actual LLM call is performed client-side using user-provided API keys (stored locally).

Technical Architecture

1. Infrastructure (Docker)

Qdrant: Add a self-hosted Qdrant service to docker-compose.yml.
Worker: Ensure Celery workers are configured with an embedding queue.

2. Backend (FastAPI + Celery)

Vector Database: Use Qdrant instead of a relational DB for embeddings.
Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (running locally on the worker).
Text Processing:
- PDF Extraction: PyMuPDF (fitz).
- Chunking: tiktoken or langchain recursive character splitter.
API Endpoints:
- POST /rag/sync: Trigger embedding for a specific record_id.
- POST /rag/retrieve: Perform vector search in Qdrant + Reranking.
- GET /rag/status/{id}: Monitor processing progress.

3. Frontend (React)

Chat Interface: A dedicated page (/rag) for conversational search.
Record Selection: Allow users to scope their chat to specific uploaded files.
Client-Side LLM: Frontend logic to construct context-augmented prompts and call OpenAI/Gemini directly using local API keys.

Task Breakdown

Phase 1: Infrastructure & Infrastructure Setup

Add Qdrant to docker-compose.yml.
Install dependencies: qdrant-client, sentence-transformers, pymupdf, tiktoken.

Phase 2: Ingestion Pipeline

Create Celery task embed_record in app/tasks/embedding.py.
Implement text extraction for PDF/TXT.
Implement chunking strategy (e.g., 500 tokens with 50 overlap).
Implement embedding and Qdrant storage logic.

Phase 3: API & Retrieval

Implement /sync, /retrieve, and /status endpoints in app/api/v1/endpoints/rag.py.
Implement reranking logic

Phase 4: Frontend Implementation

Build RagChatPage.tsx component.
Implement local API key management.
Implement chat-to-retrieval-to-LLM flow.

Acceptance Criteria

Users can trigger embedding for any uploaded document.
Retrieval returns top-K relevant chunks with high semantic similarity.
The chat interface successfully generates answers based only on the provided context.
All vectors are properly associated with user records to ensure data isolation.

Ingestion workflow:

Retrieval Workflow:

Generation workflow:

Edited Feb 20, 2026 by Praneeth Ashish

Assignee Loading

Time tracking Loading