feat(asr): add Faster Whisper large-v3 engine and model selection support
Summary
This MR introduces support for the Faster Whisper large-v3 model and adds model selection capability to the transcription API. Users can now explicitly choose the ASR engine during transcription while maintaining backward compatibility with the existing language-based routing logic.
Changes Implemented
Model Selection Support
- Added an optional model parameter to the /transcribe endpoint.
- Supported model values:
- swecha_gonthuka
- whisper_small
- faster_whisper_large_v3
- Preserved existing behavior when no model is specified:
- Swecha Gonthuka for Telugu transcription.
- Whisper Small for other languages.
Faster Whisper Engine Integration
- Added a new FasterWhisperEngine implementation conforming to the existing AsrEngine interface.
- Integrated Faster Whisper using the faster-whisper library.
- Added configurable runtime settings through environment variables:
- FASTER_WHISPER_DEVICE
- FASTER_WHISPER_COMPUTE_TYPE
- Supports automatic package installation and initialization during first load when required.
Router Enhancements
- Extended ModelRouter to manage all available ASR engines.
- Added model-based engine selection functionality.
- Implemented dedicated routing methods for explicit model selection.
- Added support for configurable Whisper Small model loading.
Pipeline Updates
Propagated the selected model through the complete transcription workflow:
- API endpoint
- Audio transcription service
- File transcription pipeline
- Job management layer
- Background processing workflow
- Diarization integration path
Bug Fixes
- Fixed timestamp normalization issues when Whisper returns None timestamps.
- Prevented transcription failures caused by timestamp processing errors.
- Updated logging configuration to avoid formatter conflicts introduced by third-party libraries.
Benefits
- Enables direct comparison of multiple ASR models through a single API.
- Provides access to Faster Whisper large-v3 for improved transcription quality and performance.
- Maintains backward compatibility with existing clients.
- Improves flexibility for benchmarking and model evaluation workflows.
Testing
Verified:
- Transcription using swecha_gonthuka.
- Transcription using whisper_small.
- Transcription using faster_whisper_large_v3.
- Default routing when model parameter is omitted.
- End-to-end transcription pipeline with explicit model selection.
- Timestamp normalization and logging fixes.
- API response consistency across all supported models.
*Closes #33
Edited by vyshnavi