Skip to content

feat: flag speaker diarization in transcribtion endpoint

ashritha kunjeti requested to merge diarization into develop

Overview

This MR integrates speaker diarization directly into the existing /transcribe pipeline using a configurable enable_diarization flag.

Previously, diarization and transcription were handled through separate services and endpoints, leading to duplicated logic, inconsistent processing flows, and higher maintenance overhead. This implementation consolidates both functionalities into a single unified pipeline.

Summary of Changes

  • Unified API: Diarization is now integrated into the main /transcribe endpoint via the enable_diarization and num_speakers flags.
  • Diarization Fallback: Introduced a fallback mechanism that transcribes individual speaker segments when coarse ASR timestamps prevent accurate speaker alignment.
  • LID Improvements: Enhanced Whisper-based Language Identification to better handle non-language tokens.
  • Bug Fixes: Resolved several Python syntax errors in exception handling and improved resource cleanup (temporary file management).

*closes #21 (closed)

Edited by ashritha kunjeti

Merge request reports

Loading