Fix: Improve Telugu Live Transcription Reliability (WebSocket + Streaming)
Description
- Update live Telugu transcription pipeline to improve reliability and accuracy without changing the ASR model.
- Currently, the system depends on unstable WebSocket handling and inconsistent audio input, leading to missing final transcripts and noisy partial outputs.
- This change focuses on fixing streaming, session lifecycle, and audio pipeline issues to ensure consistent UI behavior and better transcription quality.
Changes Required
- Ensure client sends
endevent and waits for final (is_final=true) before closing WebSocket - Add timeout-based fallback only if final transcript is not received
- Enforce Telugu (
te) language across streaming pipeline - Apply mic constraints (echoCancellation, noiseSuppression, autoGainControl)
- Improve session handling (clean start/reset per recording)
- Refine chunk processing and silence thresholds
- Clean partial vs final transcription behavior via config
Motivation
- Eliminates missing final transcript issue in UI
- Improves stability of real-time transcription
- Enhances accuracy without changing model
- Reduces dependency on audio/environment inconsistencies
- Provides consistent streaming experience across sessions
Acceptance Criteria
- Final transcript always appears in UI after stop
- WebSocket does not close before final message
- Telugu-only transcription is enforced
- Partial and final outputs behave consistently
- Stable results for short Telugu speech