Praneeth Ashish requested to merge feat/rag-chatbot-model into develop Feb 20, 2026

Implement RAG Backend with Qdrant for Corpus Chatbot

Adds asynchronous document embedding, Qdrant vector storage, and retrieval pipeline for document-grounded chat

Summary

This merge request introduces a Retrieval-Augmented Generation (RAG) pipeline in the corpus server to enable document-based chatbot functionality.

The backend now supports:

Asynchronous document embedding using Celery
Vector storage and similarity search using Qdrant
Retrieval of relevant document chunks for frontend-based LLM generation
User-managed API keys (LLM calls handled entirely on the client side)

The server is responsible only for embedding, storage, and retrieval — no LLM calls or key handling occur server-side.

Issues

Closes #95 (closed)

Motivation

Currently the system lacks semantic search and document-grounded conversation capabilities.

This change enables:

Users to query content from their uploaded documents
Scalable vector-based similarity retrieval
Secure architecture (no LLM provider keys stored or used on the server)
Easy integration with external LLM providers (OpenAI, Gemini, Anthropic, etc.)

This implements the core RAG feature requirement for the corpus application.

Technical Changes

New Features

Qdrant-based vector storage
Background embedding pipeline powered by Celery
RAG-specific API endpoints
Document chunking + embedding logic
Retrieval endpoint with optional reranking support

New API Endpoints

Method	Endpoint	Purpose
POST	`/api/v1/rag/embed`	Start embedding process (async)
POST	`/api/v1/rag/retrieve`	Retrieve relevant document chunks
GET	`/api/v1/rag/status/{record_id}`	Check status of embedding task

Infrastructure

Added Qdrant service to docker-compose.yml
Configured Redis + Celery for background embedding jobs
Added required embedding and vector database dependencies

Files Added / Modified

New files

app/api/v1/endpoints/rag.py
app/tasks/embedding.py
app/services/qdrant_client.py
app/models/chunk_status.py

Modified files

docker-compose.yml
pyproject.toml
app/api/v1/api.py
app/core/config.py

Workflow Overview

Ingestion

User triggers POST /api/v1/rag/embed
Task is queued in Redis
Celery worker extracts text, chunks it, creates embeddings
Vectors are stored in Qdrant

Retrieval

User sends query to POST /api/v1/rag/retrieve
Server embeds the query text
Performs similarity search in Qdrant
Returns relevant chunks

Generation
Handled completely on the frontend using the user's own LLM API key.

Acceptance Criteria

Documents can be successfully embedded
Vectors are correctly stored in Qdrant
Retrieval endpoint returns relevant chunks
No LLM API keys are stored or handled on the server
All new API routes are documented
End-to-end demo (upload → embed → retrieve → frontend chat) works

Testing

Tested successfully with PDF and TXT documents
Verified Celery embedding pipeline execution
Validated Qdrant similarity search results
Performed manual end-to-end testing via the frontend

Known Limitations

Currently supports only text-based documents
Reranking is implemented as optional and can be further improved
No automatic re-embedding when source files are updated

Documentation

Updated README with RAG setup instructions
Added examples of required environment variables
Included sample API usage for embed and retrieve endpoints

How to Test Locally

docker-compose up

[ ]Upload a document (via existing upload flow) [ ]Call POST /api/v1/rag/embed [ ]Poll status with GET /api/v1/rag/status/{record_id} until ready [ ]Query via POST /api/v1/rag/retrieve [ ]Ask questions in the frontend using the retrieved context

Review Notes Please focus on:

Correctness of the embedding pipeline
Proper Qdrant integration (collections, upsert, search)
API error handling and response consistency
Security — confirm no leakage or handling of LLM API keys server-side

Thank you for reviewing!

Draft: feat/RAG tool in Corpus App