Skip to content

fix transcription fallback

Shanmukha varma Lanke requested to merge fix/transcription-fix into develop

Overview: This Merge Request introduces a high-performance, WebSocket-based streaming architecture for real-time speech-to-text processing. It replaces the limitations of the polling-based approach with a stateful communication model and refactors the ASR service into a more modular and maintainable structure.

Motivation: The previous polling-based transcription mechanism resulted in increased latency and limited support for real-time use cases. This MR addresses these limitations by enabling continuous streaming, reducing response times, and improving transcription accuracy.

Key Changes:

  1. WebSocket Streaming Endpoint
  • Added /api/transcribe/live for real-time transcription
  • Supports concurrent client sessions
  • Enables bidirectional, low-latency communication
  1. Session Management
  • Introduced LiveTranscriptionSession abstraction

  • Handles:

    • Audio buffering
    • Voice Activity Detection (VAD)
    • Incremental transcription updates
  1. Audio Decoding Optimization
  • Implemented decode_audio_chunk in asr_service.py

  • Dual-path decoding strategy:

    • Primary: ffmpeg (performance optimized)
    • Fallback: librosa (robustness)
  1. ASR Service Refactor
  • Transitioned to transcribe_audio_samples pipeline
  • Standardized timestamp normalization across ASR outputs
  • Improved synthetic chunk handling for better subtitle alignment
  • Enhanced modularity and maintainability of ASR components
  1. Bug Fixes & Stability
  • fix(asr): resolved merge conflicts in asr_service.py
  • Ensured compatibility with saas/phase4 architecture
  • Improved error handling in streaming workflows

Backward Compatibility:

  • Existing HTTP endpoint /api/transcribe remains unchanged
  • No breaking changes introduced for current consumers

Impact:

  • Enables real-time subtitle generation
  • Reduces transcription latency significantly
  • Improves developer experience for frontend integration

Testing & Validation:

Unit Testing:

  • All 56 tests in tests/test_asr_service.py are passing

Integration Testing:

  • Verified WebSocket route registration in FastAPI application
  • Validated streaming transcription workflow end-to-end

Next Steps:

  • Update frontend implementation:

    • Replace polling (setInterval) with WebSocket connection
    • Consume interim and final transcription updates
  • Perform load testing for concurrent streaming sessions

  • Monitor performance in production environment

Checklist:

  • WebSocket endpoint implemented
  • ASR service refactored
  • Backward compatibility ensured
  • Unit tests passing
  • Integration verified

closes #16

Merge request reports

Loading