Add client-side speaker diarization support without backend dependency
Background
The frontend transcription workflow currently relies on backend services for advanced audio processing. As a result, speaker diarization functionality is unavailable when the backend API is not running, limiting the ability to identify and separate speakers during transcription.
Problem Statement
Users are unable to perform speaker diarization independently from the backend. This creates a dependency on backend availability and increases latency for speaker attribution during transcription.
Proposed Solution
Implement speaker diarization directly in the frontend using a browser-based inference pipeline. The solution should:
- Load and run the Pyannote speaker segmentation model locally in the browser.
- Perform diarization through a Web Worker to avoid blocking the UI.
- Generate speaker segments from uploaded or selected audio files.
- Associate transcription chunks with detected speakers based on segment overlap.
- Display speaker-attributed transcripts and speaker segments in the UI.
- Provide graceful fallback behavior when diarization inference is unavailable.
- Operate without requiring the backend API to be running.
Acceptance Criteria
- Users can enable speaker diarization without starting the backend service.
- Speaker segmentation runs entirely on the client side.
- Speaker labels are assigned to transcription chunks.
- Speaker information is displayed in the transcript and speaker segment panel.
- Existing transcription functionality remains unaffected when diarization is disabled.
- Application continues to function if diarization model loading or inference fails.
- UI remains responsive during diarization processing through the use of Web Workers.
Expected Outcome
The frontend supports end-to-end speaker diarization independently, reducing backend dependency and enabling speaker-aware transcription even when backend services are unavailable.