Skip to content

Draft: feat/RAG tool in Corpus App

Praneeth Ashish requested to merge feat/rag-chatbot-model into develop

Implement RAG Backend with Qdrant for Corpus Chatbot

Adds asynchronous document embedding, Qdrant vector storage, and retrieval pipeline for document-grounded chat

Summary

This merge request introduces a Retrieval-Augmented Generation (RAG) pipeline in the corpus server to enable document-based chatbot functionality.

The backend now supports:

  • Asynchronous document embedding using Celery
  • Vector storage and similarity search using Qdrant
  • Retrieval of relevant document chunks for frontend-based LLM generation
  • User-managed API keys (LLM calls handled entirely on the client side)

The server is responsible only for embedding, storage, and retrieval — no LLM calls or key handling occur server-side.

Issues

Closes #95 (closed)

Motivation

Currently the system lacks semantic search and document-grounded conversation capabilities.

This change enables:

  • Users to query content from their uploaded documents
  • Scalable vector-based similarity retrieval
  • Secure architecture (no LLM provider keys stored or used on the server)
  • Easy integration with external LLM providers (OpenAI, Gemini, Anthropic, etc.)

This implements the core RAG feature requirement for the corpus application.

Technical Changes

New Features

  • Qdrant-based vector storage
  • Background embedding pipeline powered by Celery
  • RAG-specific API endpoints
  • Document chunking + embedding logic
  • Retrieval endpoint with optional reranking support

New API Endpoints

Method Endpoint Purpose
POST /api/v1/rag/embed Start embedding process (async)
POST /api/v1/rag/retrieve Retrieve relevant document chunks
GET /api/v1/rag/status/{record_id} Check status of embedding task

Infrastructure

  • Added Qdrant service to docker-compose.yml
  • Configured Redis + Celery for background embedding jobs
  • Added required embedding and vector database dependencies

Files Added / Modified

New files

  • app/api/v1/endpoints/rag.py
  • app/tasks/embedding.py
  • app/services/qdrant_client.py
  • app/models/chunk_status.py

Modified files

  • docker-compose.yml
  • pyproject.toml
  • app/api/v1/api.py
  • app/core/config.py

Workflow Overview

Ingestion

  1. User triggers POST /api/v1/rag/embed
  2. Task is queued in Redis
  3. Celery worker extracts text, chunks it, creates embeddings
  4. Vectors are stored in Qdrant

Retrieval

  1. User sends query to POST /api/v1/rag/retrieve
  2. Server embeds the query text
  3. Performs similarity search in Qdrant
  4. Returns relevant chunks

Generation
Handled completely on the frontend using the user's own LLM API key.

Acceptance Criteria

  • Documents can be successfully embedded
  • Vectors are correctly stored in Qdrant
  • Retrieval endpoint returns relevant chunks
  • No LLM API keys are stored or handled on the server
  • All new API routes are documented
  • End-to-end demo (upload → embed → retrieve → frontend chat) works

Testing

  • Tested successfully with PDF and TXT documents
  • Verified Celery embedding pipeline execution
  • Validated Qdrant similarity search results
  • Performed manual end-to-end testing via the frontend

Known Limitations

  • Currently supports only text-based documents
  • Reranking is implemented as optional and can be further improved
  • No automatic re-embedding when source files are updated

Documentation

  • Updated README with RAG setup instructions
  • Added examples of required environment variables
  • Included sample API usage for embed and retrieve endpoints

How to Test Locally

docker-compose up

[ ]Upload a document (via existing upload flow) [ ]Call POST /api/v1/rag/embed [ ]Poll status with GET /api/v1/rag/status/{record_id} until ready [ ]Query via POST /api/v1/rag/retrieve [ ]Ask questions in the frontend using the retrieved context

Review Notes Please focus on:

  • Correctness of the embedding pipeline
  • Proper Qdrant integration (collections, upsert, search)
  • API error handling and response consistency
  • Security — confirm no leakage or handling of LLM API keys server-side

Thank you for reviewing!

Merge request reports

Loading