feat: flag speaker diarization in transcribtion endpoint
Overview
This MR integrates speaker diarization directly into the existing /transcribe pipeline using a configurable enable_diarization flag.
Previously, diarization and transcription were handled through separate services and endpoints, leading to duplicated logic, inconsistent processing flows, and higher maintenance overhead. This implementation consolidates both functionalities into a single unified pipeline.
Summary of Changes
- Unified API: Diarization is now integrated into the main /transcribe endpoint via the enable_diarization and num_speakers flags.
- Diarization Fallback: Introduced a fallback mechanism that transcribes individual speaker segments when coarse ASR timestamps prevent accurate speaker alignment.
- LID Improvements: Enhanced Whisper-based Language Identification to better handle non-language tokens.
- Bug Fixes: Resolved several Python syntax errors in exception handling and improved resource cleanup (temporary file management).
*closes #21 (closed)
Edited by ashritha kunjeti