feat: add diarization in the front-end independent of back-end (!43) · Merge requests · VISWAM / apps / Speech / Voice App Frontend

vyshnavi requested to merge frontend_diarization into feat/develop-pro Jun 02, 2026

Summary

This MR introduces speaker diarization in the frontend, enabling speaker identification and segmentation directly in the browser without requiring the backend ASR API to be running.

What was implemented

Added client-side speaker diarization using the onnx-community/pyannote-segmentation-3.0 model.
Implemented a dedicated Web Worker to perform diarization inference without blocking the UI thread.
Added lazy model initialization to load the diarization model only when required.
Integrated diarization with the existing transcription workflow.
Mapped diarization segments to transcription chunks to assign speaker labels (SPEAKER_00, SPEAKER_01, etc.).
Added speaker-segment rendering in the UI.
Preserved existing transcription functionality when diarization is disabled.
Implemented fallback heuristics when diarization results are unavailable.

Diarization Flow

User uploads or selects an audio file.
Frontend performs diarization inference locally using the Pyannote ONNX model.
Speaker segments are generated and stored for reuse across transcription chunks.
During chunk-wise transcription, the corresponding speaker is identified based on segment overlap.
Transcribed text is tagged with speaker labels and displayed in the editor and speaker panel.

Benefits

No backend dependency for speaker diarization.
Faster user feedback by performing inference locally.
Reduced backend resource utilization.
Works in offline or backend-unavailable scenarios.
Improves transcription readability through speaker-attributed transcripts.

Fallback Behavior

If the diarization model fails to load or inference fails, the application falls back to the existing heuristic-based speaker assignment. Standard transcription remains unaffected when diarization is disabled.

Testing Verified:

Diarization model loading and initialization.
Speaker segment generation for uploaded audio files.
Speaker label assignment during transcription.
UI rendering of diarized speaker segments.
Fallback behavior when diarization is unavailable.
Transcription flow without backend API connectivity.

*Closes #51

Edited Jun 02, 2026 by vyshnavi

feat: add diarization in the front-end independent of back-end

Merge request reports