Add Phase 4: Finalize LID with Whisper's built-in detection

Description

Current Language Identification (LID) and ASR fallback mechanisms require improvements to ensure higher accuracy, stability, and better handling of multilingual audio inputs.

This issue focuses on upgrading the LID model, fixing Whisper hallucination issues, and improving overall system robustness in real-world scenarios such as noisy, silent, and code-switched audio.

Problem Statement

Existing LID model lacks strong acoustic accuracy for Indic and multilingual inputs
Whisper fallback produces hallucinated outputs on empty or noisy audio
No proper handling for mixed/code-switched language inputs
Occasional huggingface_hub 401 errors affecting model loading
Documentation and testing do not reflect latest evaluation standards

Proposed Solution

Migrate LID engine to speechbrain/lang-id-voxlingua107-ecapa
Upgrade fallback ASR to openai/whisper-base to reduce hallucinations
Introduce explicit mixed language bypass in model router
Fix huggingface_hub authentication and timeout handling
Update docs/LID_SELECTION.md with standardized evaluation metrics
Strengthen integration tests with strict assertions for full workflow coverage

Acceptance Criteria

LID model upgraded and integrated successfully
Improved accuracy for Indic and multilingual audio
Whisper fallback does not hallucinate on empty/noisy inputs
Mixed language inputs handled correctly via router
No 401 errors from huggingface_hub during model fetch
Documentation updated with latest benchmarks
Integration tests pass with full coverage (batch + streaming)

Edited Apr 18, 2026 by ashritha kunjeti