Feat: Implementation of Async Job Foundation
Overview
This MR introduces the foundational async job-based transcription system, replacing the existing synchronous /api/transcribe endpoint with a scalable, non-blocking architecture.
The new design enables background processing of transcription tasks using a job queue model backed by SQLite and filesystem storage.
What’s Included in This MR
API Refactor
-
Replaced synchronous
/api/transcribewith async workflow -
Introduced:
-
POST /transcribe→ creates a transcription job and returnsjob_id -
GET /transcribe/{job_id}→ fetches job status and result
-
Job Management System
-
Implemented SQLite-backed job manager
-
Tracks full job lifecycle:
pendingprocessingcompletedfailed
-
Stores metadata:
-
job_id,status,language -
created_at,retrieved_at,ttl_deadline - transcription
result
-
Shared Models (Pydantic)
-
Added structured request/response schemas:
TranscribeRequestJobStatusJobResult
-
Ensures consistent API contracts across services
Background Processing
-
Added FastAPI
BackgroundTasksworker -
Handles:
- Status transitions (
pending → processing → completed/failed) - Execution of transcription pipeline
- Persistence of results
- Status transitions (
TTL-Based Cleanup
-
Implemented automatic cleanup system:
- Deletes jobs after 24 hours
- OR immediately after result retrieval
-
Includes periodic cleanup worker for maintenance
Testing Updates
-
Updated existing unit tests to support async job flow
-
Validates:
- Job creation
- Status transitions
- Result retrieval
Breaking Changes
-
Removed legacy synchronous endpoint:
-
/api/transcribe(deprecated)
-
-
This is a clean v1 async API replacement
Acceptance Criteria
-
POST /transcribereturns{ job_id }immediately -
GET /transcribe/{job_id}returns correct job status and result -
Completed results are retrievable via status endpoint -
Audio and output files are deleted after retrieval or TTL expiry (24h) -
Background worker handles job lifecycle correctly -
SQLite stores all required job metadata -
Pydantic models define all API contracts -
Unit tests updated for async behavior -
Old sync /api/transcribeendpoint removed