fix transcription fallback
Overview: This Merge Request introduces a high-performance, WebSocket-based streaming architecture for real-time speech-to-text processing. It replaces the limitations of the polling-based approach with a stateful communication model and refactors the ASR service into a more modular and maintainable structure.
Motivation: The previous polling-based transcription mechanism resulted in increased latency and limited support for real-time use cases. This MR addresses these limitations by enabling continuous streaming, reducing response times, and improving transcription accuracy.
Key Changes:
- WebSocket Streaming Endpoint
- Added
/api/transcribe/livefor real-time transcription - Supports concurrent client sessions
- Enables bidirectional, low-latency communication
- Session Management
-
Introduced
LiveTranscriptionSessionabstraction -
Handles:
- Audio buffering
- Voice Activity Detection (VAD)
- Incremental transcription updates
- Audio Decoding Optimization
-
Implemented
decode_audio_chunkinasr_service.py -
Dual-path decoding strategy:
- Primary: ffmpeg (performance optimized)
- Fallback: librosa (robustness)
- ASR Service Refactor
- Transitioned to
transcribe_audio_samplespipeline - Standardized timestamp normalization across ASR outputs
- Improved synthetic chunk handling for better subtitle alignment
- Enhanced modularity and maintainability of ASR components
- Bug Fixes & Stability
- fix(asr): resolved merge conflicts in
asr_service.py - Ensured compatibility with
saas/phase4architecture - Improved error handling in streaming workflows
Backward Compatibility:
- Existing HTTP endpoint
/api/transcriberemains unchanged - No breaking changes introduced for current consumers
Impact:
- Enables real-time subtitle generation
- Reduces transcription latency significantly
- Improves developer experience for frontend integration
Testing & Validation:
Unit Testing:
- All 56 tests in
tests/test_asr_service.pyare passing
Integration Testing:
- Verified WebSocket route registration in FastAPI application
- Validated streaming transcription workflow end-to-end
Next Steps:
-
Update frontend implementation:
- Replace polling (
setInterval) with WebSocket connection - Consume interim and final transcription updates
- Replace polling (
-
Perform load testing for concurrent streaming sessions
-
Monitor performance in production environment
Checklist:
-
WebSocket endpoint implemented -
ASR service refactored -
Backward compatibility ensured -
Unit tests passing -
Integration verified
closes #16