feat(asr): finalize Phase 4 LID with Whisper's built-in detection
Description
This MR completes Phase 4: Language Identification (LID) & ASR stability improvements by upgrading the LID model, improving ASR fallback behavior, and strengthening system reliability across multilingual workflows.
Key Changes
-
LID Engine Upgrade
- Migrated to speechbrain/lang-id-voxlingua107-ecapa
- Improves acoustic-level language detection accuracy, especially for Indic languages
-
ASR Fallback Improvement
- Replaced fallback with openai/whisper-base
- Eliminates decoder hallucination loops on empty/noisy inputs (e.g., YouTube boilerplate audio)
-
Mixed Language Handling
- Introduced explicit mixed language bypass
- Enables better handling of code-switched and multilingual inputs in router
-
HuggingFace Fix
- Resolved huggingface_hub 401 Unauthorized issue
- Stabilized model fetch and timeout handling
-
Documentation Update
- Rewrote docs/LID_SELECTION.md
- Added standardized evaluation metrics based on VoxLingua107 benchmarks
-
Testing Enhancements
- Strengthened integration tests
- Added strict assertion thresholds
- Achieved full workflow coverage (batch + PCM16 streaming)
-
Add language dropdown in Swagger using Enum
- Introduced LanguageOption Enum with 'te', 'en', 'hi', and 'other'
- Updated /transcribe endpoint to use Enum instead of raw string
- Set default language to 'other' for better UX
- Enabled automatic Swagger UI dropdown for language selection
- Added internal mapping: 'other' → None to preserve LID auto-detection flow *This improves API usability while maintaining existing language routing logic
Impact
- More accurate language detection across diverse audio inputs
- Improved ASR reliability with reduced hallucinations
- Better support for multilingual and code-switched scenarios
- Increased system stability and test coverage
Checklist
-
LID model upgraded and validated
-
ASR fallback tested on noisy/empty audio
-
Mixed language routing verified
-
Docs updated
-
Integration tests passing
-
closes #13 (closed)
Edited by ashritha kunjeti