Feat: Implement Celery and Redis Background Processing Wrapper
Overview
This Merge Request activates the Phase 3 roadmap by implementing an asynchronous background task queue using Celery and Redis. It transforms the pipeline from a blocking synchronous system into a distributed, scalable architecture.
Changes
Core Infrastructure
-
Celery Integration: Added
celery_config.pywith support for multiple queues (default_queue,vlm_queue) and task routing rules. -
Async Tasks: Created
tasks.pyto define extraction jobs. Implemented Lazy Pipeline Loading to ensure CPU workers don't waste memory loading GPU-bound models. - Resource Management: Added automatic file cleanup logic in background tasks to prevent disk leakage in the persistent upload directory.
API & CLI
-
Async Endpoints: Added
POST /extract/asyncfor non-blocking job submission andGET /jobs/{job_id}for polling results. -
Worker Command: Added a new
workercommand to the Typer CLI to facilitate easy launching of distributed workers with configurable concurrency and queue targeting.
Docker & DevOps
-
Orchestration: Updated
docker-compose.ymlwith a production-ready setup:-
redis: The central message broker. -
worker-cpu: Scaled for 8 concurrent standard extraction jobs. -
worker-gpu: Dedicated for 2 concurrent vision/LLM jobs with NVIDIA GPU access.
-
-
Dependencies: Added
celery,redis, andflowertopyproject.toml.
Quality & Testing
-
Async Integration Tests: Added
tests/test_async_integration.pycovering all new modules, error paths, and lazy-loading logic. - Code Coverage: Increased total project coverage to 97.37%.
- Linting: Fully compliant with Ruff (formatting/linting) and Mypy (strict type checking).
Impact
- Scalability: The system can now ingest thousands of documents without blocking the main API server.
- Efficiency: GPU resources are reserved exclusively for VLM tasks, while standard OCR runs on cheaper CPU workers.
- User Experience: Immediate job submission responses prevent browser/client timeouts.
Related Issues
Closes #12 (closed)
Checklist
-
Code follows project style guidelines (Ruff/Mypy). -
Tests passed locally with >95% coverage. -
Docker Compose setup verified. -
Automated file cleanup verified.
Deployment Note: Ensure a Redis instance is reachable by the API and Worker containers via the CELERY_BROKER_URL environment variable.
Edited by Praneeth Ashish