feature(backend):Phase 2 WebSocket streaming for real-time ASR (non-blocking, low-latency)Feature/websocket streaming
🚀 Phase 2: Real-Time Streaming via WebSocket
Summary
Adds a WebSocket endpoint (/api/stream) to enable low-latency, real-time transcription. Audio is streamed in chunks, processed incrementally, and partial/final transcripts are returned to the client.
Key Features
- WebSocket streaming endpoint:
/api/stream - Chunked buffering (~1s audio) with in-place operations
- Non-blocking ASR execution via
asyncio.to_thread - Incremental responses:
{ text, is_final } - Robust handling of disconnects and malformed frames
- Memory-safe buffer cap
- Structured
[STREAM]logging
Design Notes
- No changes to Phase 1 (async job system)
- No additional dependencies introduced
- Temp file lifecycle handled within ASR layer