feat: flag speaker diarization in transcribtion endpoint (!32) · Merge requests · VISWAM / apps / Speech / Voice App Backend

ashritha kunjeti requested to merge diarization into develop May 07, 2026

Overview

This MR integrates speaker diarization directly into the existing /transcribe pipeline using a configurable enable_diarization flag.

Previously, diarization and transcription were handled through separate services and endpoints, leading to duplicated logic, inconsistent processing flows, and higher maintenance overhead. This implementation consolidates both functionalities into a single unified pipeline.

Summary of Changes

Unified API: Diarization is now integrated into the main /transcribe endpoint via the enable_diarization and num_speakers flags.
Diarization Fallback: Introduced a fallback mechanism that transcribes individual speaker segments when coarse ASR timestamps prevent accurate speaker alignment.
LID Improvements: Enhanced Whisper-based Language Identification to better handle non-language tokens.
Bug Fixes: Resolved several Python syntax errors in exception handling and improved resource cleanup (temporary file management).

*closes #21 (closed)

Edited May 07, 2026 by ashritha kunjeti

feat: flag speaker diarization in transcribtion endpoint

Merge request reports