Skip to content

feat(asr): add Faster Whisper large-v3 engine and model selection support

vyshnavi requested to merge faster into develop

Summary

This MR introduces support for the Faster Whisper large-v3 model and adds model selection capability to the transcription API. Users can now explicitly choose the ASR engine during transcription while maintaining backward compatibility with the existing language-based routing logic.

Changes Implemented

Model Selection Support

  • Added an optional model parameter to the /transcribe endpoint.
  • Supported model values:
  1. swecha_gonthuka
  2. whisper_small
  3. faster_whisper_large_v3
  • Preserved existing behavior when no model is specified:
  1. Swecha Gonthuka for Telugu transcription.
  2. Whisper Small for other languages.

Faster Whisper Engine Integration

  • Added a new FasterWhisperEngine implementation conforming to the existing AsrEngine interface.
  • Integrated Faster Whisper using the faster-whisper library.
  • Added configurable runtime settings through environment variables:
  1. FASTER_WHISPER_DEVICE
  2. FASTER_WHISPER_COMPUTE_TYPE
  • Supports automatic package installation and initialization during first load when required.

Router Enhancements

  • Extended ModelRouter to manage all available ASR engines.
  • Added model-based engine selection functionality.
  • Implemented dedicated routing methods for explicit model selection.
  • Added support for configurable Whisper Small model loading.

Pipeline Updates

Propagated the selected model through the complete transcription workflow:

  • API endpoint
  • Audio transcription service
  • File transcription pipeline
  • Job management layer
  • Background processing workflow
  • Diarization integration path

Bug Fixes

  • Fixed timestamp normalization issues when Whisper returns None timestamps.
  • Prevented transcription failures caused by timestamp processing errors.
  • Updated logging configuration to avoid formatter conflicts introduced by third-party libraries.

Benefits

  • Enables direct comparison of multiple ASR models through a single API.
  • Provides access to Faster Whisper large-v3 for improved transcription quality and performance.
  • Maintains backward compatibility with existing clients.
  • Improves flexibility for benchmarking and model evaluation workflows.

Testing

Verified:

  • Transcription using swecha_gonthuka.
  • Transcription using whisper_small.
  • Transcription using faster_whisper_large_v3.
  • Default routing when model parameter is omitted.
  • End-to-end transcription pipeline with explicit model selection.
  • Timestamp normalization and logging fixes.
  • API response consistency across all supported models.

*Closes #33

Edited by vyshnavi

Merge request reports

Loading