fix transcription fallback (!23) · Merge requests · VISWAM / apps / Speech / Voice App Backend

Shanmukha varma Lanke requested to merge fix/transcription-fix into develop Apr 19, 2026

Overview: This Merge Request introduces a high-performance, WebSocket-based streaming architecture for real-time speech-to-text processing. It replaces the limitations of the polling-based approach with a stateful communication model and refactors the ASR service into a more modular and maintainable structure.

Motivation: The previous polling-based transcription mechanism resulted in increased latency and limited support for real-time use cases. This MR addresses these limitations by enabling continuous streaming, reducing response times, and improving transcription accuracy.

Key Changes:

WebSocket Streaming Endpoint

Added /api/transcribe/live for real-time transcription
Supports concurrent client sessions
Enables bidirectional, low-latency communication

Session Management

Introduced LiveTranscriptionSession abstraction
Handles:
- Audio buffering
- Voice Activity Detection (VAD)
- Incremental transcription updates

Audio Decoding Optimization

Implemented decode_audio_chunk in asr_service.py
Dual-path decoding strategy:
- Primary: ffmpeg (performance optimized)
- Fallback: librosa (robustness)

ASR Service Refactor

Transitioned to transcribe_audio_samples pipeline
Standardized timestamp normalization across ASR outputs
Improved synthetic chunk handling for better subtitle alignment
Enhanced modularity and maintainability of ASR components

Bug Fixes & Stability

fix(asr): resolved merge conflicts in asr_service.py
Ensured compatibility with saas/phase4 architecture
Improved error handling in streaming workflows

Backward Compatibility:

Existing HTTP endpoint /api/transcribe remains unchanged
No breaking changes introduced for current consumers

Impact:

Enables real-time subtitle generation
Reduces transcription latency significantly
Improves developer experience for frontend integration

Testing & Validation:

Unit Testing:

All 56 tests in tests/test_asr_service.py are passing

Integration Testing:

Verified WebSocket route registration in FastAPI application
Validated streaming transcription workflow end-to-end

Next Steps:

Update frontend implementation:
- Replace polling (setInterval) with WebSocket connection
- Consume interim and final transcription updates
Perform load testing for concurrent streaming sessions
Monitor performance in production environment

Checklist:

closes #16

fix transcription fallback

Merge request reports