feat(asr): add Faster Whisper large-v3 engine and model selection support (!49) · Merge requests · VISWAM / apps / Speech / Voice App Backend

vyshnavi requested to merge faster into develop Jun 02, 2026

Summary

This MR introduces support for the Faster Whisper large-v3 model and adds model selection capability to the transcription API. Users can now explicitly choose the ASR engine during transcription while maintaining backward compatibility with the existing language-based routing logic.

Changes Implemented

Model Selection Support

Added an optional model parameter to the /transcribe endpoint.
Supported model values:

swecha_gonthuka
whisper_small
faster_whisper_large_v3

Preserved existing behavior when no model is specified:

Swecha Gonthuka for Telugu transcription.
Whisper Small for other languages.

Faster Whisper Engine Integration

Added a new FasterWhisperEngine implementation conforming to the existing AsrEngine interface.
Integrated Faster Whisper using the faster-whisper library.
Added configurable runtime settings through environment variables:

FASTER_WHISPER_DEVICE
FASTER_WHISPER_COMPUTE_TYPE

Supports automatic package installation and initialization during first load when required.

Router Enhancements

Extended ModelRouter to manage all available ASR engines.
Added model-based engine selection functionality.
Implemented dedicated routing methods for explicit model selection.
Added support for configurable Whisper Small model loading.

Pipeline Updates

Propagated the selected model through the complete transcription workflow:

API endpoint
Audio transcription service
File transcription pipeline
Job management layer
Background processing workflow
Diarization integration path

Bug Fixes

Fixed timestamp normalization issues when Whisper returns None timestamps.
Prevented transcription failures caused by timestamp processing errors.
Updated logging configuration to avoid formatter conflicts introduced by third-party libraries.

Benefits

Enables direct comparison of multiple ASR models through a single API.
Provides access to Faster Whisper large-v3 for improved transcription quality and performance.
Maintains backward compatibility with existing clients.
Improves flexibility for benchmarking and model evaluation workflows.

Testing

Verified:

Transcription using swecha_gonthuka.
Transcription using whisper_small.
Transcription using faster_whisper_large_v3.
Default routing when model parameter is omitted.
End-to-end transcription pipeline with explicit model selection.
Timestamp normalization and logging fixes.
API response consistency across all supported models.

*Closes #33

Edited Jun 02, 2026 by vyshnavi

feat(asr): add Faster Whisper large-v3 engine and model selection support

Merge request reports