Add Phase 4: Finalize LID with Whisper's built-in detection
Description
Current Language Identification (LID) and ASR fallback mechanisms require improvements to ensure higher accuracy, stability, and better handling of multilingual audio inputs.
This issue focuses on upgrading the LID model, fixing Whisper hallucination issues, and improving overall system robustness in real-world scenarios such as noisy, silent, and code-switched audio.
Problem Statement
- Existing LID model lacks strong acoustic accuracy for Indic and multilingual inputs
- Whisper fallback produces hallucinated outputs on empty or noisy audio
- No proper handling for mixed/code-switched language inputs
- Occasional huggingface_hub 401 errors affecting model loading
- Documentation and testing do not reflect latest evaluation standards
Proposed Solution
- Migrate LID engine to speechbrain/lang-id-voxlingua107-ecapa
- Upgrade fallback ASR to openai/whisper-base to reduce hallucinations
- Introduce explicit mixed language bypass in model router
- Fix huggingface_hub authentication and timeout handling
- Update docs/LID_SELECTION.md with standardized evaluation metrics
- Strengthen integration tests with strict assertions for full workflow coverage
Acceptance Criteria
- LID model upgraded and integrated successfully
- Improved accuracy for Indic and multilingual audio
- Whisper fallback does not hallucinate on empty/noisy inputs
- Mixed language inputs handled correctly via router
- No 401 errors from huggingface_hub during model fetch
- Documentation updated with latest benchmarks
- Integration tests pass with full coverage (batch + streaming)
Edited by ashritha kunjeti