Draft: feat/RAG tool in Corpus App
Implement RAG Backend with Qdrant for Corpus Chatbot
Adds asynchronous document embedding, Qdrant vector storage, and retrieval pipeline for document-grounded chat
Summary
This merge request introduces a Retrieval-Augmented Generation (RAG) pipeline in the corpus server to enable document-based chatbot functionality.
The backend now supports:
- Asynchronous document embedding using Celery
- Vector storage and similarity search using Qdrant
- Retrieval of relevant document chunks for frontend-based LLM generation
- User-managed API keys (LLM calls handled entirely on the client side)
The server is responsible only for embedding, storage, and retrieval — no LLM calls or key handling occur server-side.
Issues
Closes #95 (closed)
Motivation
Currently the system lacks semantic search and document-grounded conversation capabilities.
This change enables:
- Users to query content from their uploaded documents
- Scalable vector-based similarity retrieval
- Secure architecture (no LLM provider keys stored or used on the server)
- Easy integration with external LLM providers (OpenAI, Gemini, Anthropic, etc.)
This implements the core RAG feature requirement for the corpus application.
Technical Changes
New Features
- Qdrant-based vector storage
- Background embedding pipeline powered by Celery
- RAG-specific API endpoints
- Document chunking + embedding logic
- Retrieval endpoint with optional reranking support
New API Endpoints
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /api/v1/rag/embed |
Start embedding process (async) |
| POST | /api/v1/rag/retrieve |
Retrieve relevant document chunks |
| GET | /api/v1/rag/status/{record_id} |
Check status of embedding task |
Infrastructure
- Added Qdrant service to
docker-compose.yml - Configured Redis + Celery for background embedding jobs
- Added required embedding and vector database dependencies
Files Added / Modified
New files
app/api/v1/endpoints/rag.pyapp/tasks/embedding.pyapp/services/qdrant_client.pyapp/models/chunk_status.py
Modified files
docker-compose.ymlpyproject.tomlapp/api/v1/api.pyapp/core/config.py
Workflow Overview
Ingestion
- User triggers
POST /api/v1/rag/embed - Task is queued in Redis
- Celery worker extracts text, chunks it, creates embeddings
- Vectors are stored in Qdrant
Retrieval
- User sends query to
POST /api/v1/rag/retrieve - Server embeds the query text
- Performs similarity search in Qdrant
- Returns relevant chunks
Generation
Handled completely on the frontend using the user's own LLM API key.
Acceptance Criteria
- Documents can be successfully embedded
- Vectors are correctly stored in Qdrant
- Retrieval endpoint returns relevant chunks
- No LLM API keys are stored or handled on the server
- All new API routes are documented
- End-to-end demo (upload → embed → retrieve → frontend chat) works
Testing
- Tested successfully with PDF and TXT documents
- Verified Celery embedding pipeline execution
- Validated Qdrant similarity search results
- Performed manual end-to-end testing via the frontend
Known Limitations
- Currently supports only text-based documents
- Reranking is implemented as optional and can be further improved
- No automatic re-embedding when source files are updated
Documentation
- Updated README with RAG setup instructions
- Added examples of required environment variables
- Included sample API usage for embed and retrieve endpoints
How to Test Locally
docker-compose up
[ ]Upload a document (via existing upload flow) [ ]Call POST /api/v1/rag/embed [ ]Poll status with GET /api/v1/rag/status/{record_id} until ready [ ]Query via POST /api/v1/rag/retrieve [ ]Ask questions in the frontend using the retrieved context
Review Notes Please focus on:
- Correctness of the embedding pipeline
- Proper Qdrant integration (collections, upsert, search)
- API error handling and response consistency
- Security — confirm no leakage or handling of LLM API keys server-side
Thank you for reviewing!