Skip to content

fix: improve Telugu accuracy and final transcript reliability

Sahasra Reddy requested to merge fix/websocket into develop

Summary: Improvements to Telugu live transcription pipeline to enhance reliability, stability, and accuracy in WebSocket streaming without changing the ASR model.

Changes:

  • Improved WebSocket session handling:
    • clean new session initialization
    • reset support for new recordings
    • safer finalization before socket close
    • better logging for chunk processing and responses
  • Improved transcription pipeline:
    • partials use fast PCM-based path for low latency
    • final output uses full transcribe_audio(...) path for better accuracy and punctuation
  • Added streaming configuration:
    • partials toggle (STREAMING_ENABLE_PARTIALS)
    • increased partial and silence thresholds for more stable output
  • Improved speech detection and buffering for cleaner final results

Impact:

  • Fixes missing final transcript in UI after stop
  • Reduces noisy/unstable partial outputs
  • Improves streaming session reliability
  • Enhances transcription accuracy without model change

Testing:

  • Verified WebSocket flow and final transcript delivery
  • Confirmed no premature socket closure
  • Backend runs without errors after changes

Closes #15

Edited by Sahasra Reddy

Merge request reports

Loading