Skip to content

Feat: Implement Celery and Redis Background Processing Wrapper

Praneeth Ashish requested to merge feat/celery-redis-integration into develop

Overview

This Merge Request activates the Phase 3 roadmap by implementing an asynchronous background task queue using Celery and Redis. It transforms the pipeline from a blocking synchronous system into a distributed, scalable architecture.

Changes

Core Infrastructure

  • Celery Integration: Added celery_config.py with support for multiple queues (default_queue, vlm_queue) and task routing rules.
  • Async Tasks: Created tasks.py to define extraction jobs. Implemented Lazy Pipeline Loading to ensure CPU workers don't waste memory loading GPU-bound models.
  • Resource Management: Added automatic file cleanup logic in background tasks to prevent disk leakage in the persistent upload directory.

API & CLI

  • Async Endpoints: Added POST /extract/async for non-blocking job submission and GET /jobs/{job_id} for polling results.
  • Worker Command: Added a new worker command to the Typer CLI to facilitate easy launching of distributed workers with configurable concurrency and queue targeting.

Docker & DevOps

  • Orchestration: Updated docker-compose.yml with a production-ready setup:
    • redis: The central message broker.
    • worker-cpu: Scaled for 8 concurrent standard extraction jobs.
    • worker-gpu: Dedicated for 2 concurrent vision/LLM jobs with NVIDIA GPU access.
  • Dependencies: Added celery, redis, and flower to pyproject.toml.

Quality & Testing

  • Async Integration Tests: Added tests/test_async_integration.py covering all new modules, error paths, and lazy-loading logic.
  • Code Coverage: Increased total project coverage to 97.37%.
  • Linting: Fully compliant with Ruff (formatting/linting) and Mypy (strict type checking).

Impact

  • Scalability: The system can now ingest thousands of documents without blocking the main API server.
  • Efficiency: GPU resources are reserved exclusively for VLM tasks, while standard OCR runs on cheaper CPU workers.
  • User Experience: Immediate job submission responses prevent browser/client timeouts.

Related Issues

Closes #12 (closed)

Checklist

  • Code follows project style guidelines (Ruff/Mypy).
  • Tests passed locally with >95% coverage.
  • Docker Compose setup verified.
  • Automated file cleanup verified.

Deployment Note: Ensure a Redis instance is reachable by the API and Worker containers via the CELERY_BROKER_URL environment variable.

Edited by Praneeth Ashish

Merge request reports

Loading