Skip to content

Feat: Implementation of Async Job Foundation

Vemuri priya requested to merge voiceapp_backend into develop

Overview

This MR introduces the foundational async job-based transcription system, replacing the existing synchronous /api/transcribe endpoint with a scalable, non-blocking architecture. The new design enables background processing of transcription tasks using a job queue model backed by SQLite and filesystem storage.


What’s Included in This MR

API Refactor

  • Replaced synchronous /api/transcribe with async workflow

  • Introduced:

    • POST /transcribe → creates a transcription job and returns job_id
    • GET /transcribe/{job_id} → fetches job status and result

Job Management System

  • Implemented SQLite-backed job manager

  • Tracks full job lifecycle:

    • pending
    • processing
    • completed
    • failed
  • Stores metadata:

    • job_id, status, language
    • created_at, retrieved_at, ttl_deadline
    • transcription result

Shared Models (Pydantic)

  • Added structured request/response schemas:

    • TranscribeRequest
    • JobStatus
    • JobResult
  • Ensures consistent API contracts across services


Background Processing

  • Added FastAPI BackgroundTasks worker

  • Handles:

    • Status transitions (pending → processing → completed/failed)
    • Execution of transcription pipeline
    • Persistence of results

TTL-Based Cleanup

  • Implemented automatic cleanup system:

    • Deletes jobs after 24 hours
    • OR immediately after result retrieval
  • Includes periodic cleanup worker for maintenance


Testing Updates

  • Updated existing unit tests to support async job flow

  • Validates:

    • Job creation
    • Status transitions
    • Result retrieval

Breaking Changes

  • Removed legacy synchronous endpoint:

    • /api/transcribe (deprecated)
  • This is a clean v1 async API replacement


Acceptance Criteria

  • POST /transcribe returns { job_id } immediately
  • GET /transcribe/{job_id} returns correct job status and result
  • Completed results are retrievable via status endpoint
  • Audio and output files are deleted after retrieval or TTL expiry (24h)
  • Background worker handles job lifecycle correctly
  • SQLite stores all required job metadata
  • Pydantic models define all API contracts
  • Unit tests updated for async behavior
  • Old sync /api/transcribe endpoint removed

Merge request reports

Loading