Add Faster Whisper Large-v3 model support and model selection for transcription
Background
The ASR backend currently relies on language-based routing to select a transcription model. While this works for standard transcription workflows, it does not allow users or developers to explicitly choose which ASR engine should be used for a transcription request. Additionally, the system lacks support for the Faster Whisper large-v3 model, limiting benchmarking and model comparison capabilities.
Problem Statement
Users cannot manually select an ASR model during transcription, making it difficult to:
- Compare transcription quality across different models.
- Benchmark model performance.
- Experiment with alternative ASR engines for specific use cases.
- Utilize Faster Whisper large-v3 within the existing transcription pipeline.
Proposed Solution
Extend the ASR backend to support multiple transcription engines and allow explicit model selection through the transcription API.
The solution should:
- Introduce a Faster Whisper large-v3 ASR engine.
- Add an optional model parameter to the /transcribe endpoint.
- Support model-specific routing while preserving existing language-based defaults.
- Integrate Faster Whisper into the current engine architecture.
- Propagate model selection through the entire transcription workflow.
- Provide configurable runtime settings for Faster Whisper device and compute type.
- Ensure compatibility with existing job processing and diarization pipelines.
Acceptance Criteria
- /transcribe accepts an optional model parameter.
- Supported models include:
- swecha_gonthuka
- whisper_small
- faster_whisper_large_v3
- Existing transcription behavior remains unchanged when no model is specified.
- Faster Whisper large-v3 can be selected and used successfully for transcription.
- Model selection is propagated through all transcription services and job-processing layers.
- Router supports dynamic engine selection.
- Transcription requests complete successfully across all supported models.
- Timestamp handling issues are resolved and do not cause transcription failures.
- Logging remains consistent across all model implementations.
Expected Outcome
The backend supports multiple ASR engines with explicit model selection, enabling flexible transcription workflows, easier benchmarking, and integration of Faster Whisper large-v3 while maintaining backward compatibility with existing clients.