Skip to content

feat(asr): finalize Phase 4 LID with Whisper's built-in detection

ashritha kunjeti requested to merge saas/phase4 into develop

Description

This MR completes Phase 4: Language Identification (LID) & ASR stability improvements by upgrading the LID model, improving ASR fallback behavior, and strengthening system reliability across multilingual workflows.

Key Changes

  • LID Engine Upgrade

    • Migrated to speechbrain/lang-id-voxlingua107-ecapa
    • Improves acoustic-level language detection accuracy, especially for Indic languages
  • ASR Fallback Improvement

    • Replaced fallback with openai/whisper-base
    • Eliminates decoder hallucination loops on empty/noisy inputs (e.g., YouTube boilerplate audio)
  • Mixed Language Handling

    • Introduced explicit mixed language bypass
    • Enables better handling of code-switched and multilingual inputs in router
  • HuggingFace Fix

    • Resolved huggingface_hub 401 Unauthorized issue
    • Stabilized model fetch and timeout handling
  • Documentation Update

    • Rewrote docs/LID_SELECTION.md
    • Added standardized evaluation metrics based on VoxLingua107 benchmarks
  • Testing Enhancements

    • Strengthened integration tests
    • Added strict assertion thresholds
    • Achieved full workflow coverage (batch + PCM16 streaming)
  • Add language dropdown in Swagger using Enum

    • Introduced LanguageOption Enum with 'te', 'en', 'hi', and 'other'
    • Updated /transcribe endpoint to use Enum instead of raw string
    • Set default language to 'other' for better UX
    • Enabled automatic Swagger UI dropdown for language selection
    • Added internal mapping: 'other' → None to preserve LID auto-detection flow *This improves API usability while maintaining existing language routing logic

Impact

  • More accurate language detection across diverse audio inputs
  • Improved ASR reliability with reduced hallucinations
  • Better support for multilingual and code-switched scenarios
  • Increased system stability and test coverage

Checklist

  • LID model upgraded and validated

  • ASR fallback tested on noisy/empty audio

  • Mixed language routing verified

  • Docs updated

  • Integration tests passing

  • closes #13 (closed)

Edited by ashritha kunjeti

Merge request reports

Loading