add diarization + ASR backend pipeline integration (!28) · Merge requests · VISWAM / apps / Speech / Voice App Backend

ashritha kunjeti requested to merge speaker-diarization-3 into speaker-diarization Apr 26, 2026

Description

This Merge Request introduces a new integrated API endpoint that combines speaker diarization and automatic speech recognition (ASR) into a single, cohesive pipeline. The implementation enables processing of uploaded audio files to produce speaker-labeled transcriptions in a structured JSON format.

Key Changes

New API Endpoint

Added POST /diarize-transcribe in app/main.py. Accepts audio file uploads with an optional language selection parameter. Supports automatic language handling when “other” is selected.

Pipeline Integration

Orchestrates the complete workflow: Performs speaker diarization to segment audio by speaker. Applies ASR transcription on each segmented portion. Merges results into a unified response. Ensures smooth interaction between diarization and transcription modules.

Data Models

Introduced new response models in app/models/shared_models.py: DiarizedSegment: Represents individual segments with speaker label, timestamps, and transcription. DiarizationResponse: Defines the overall structured response containing language and segment list. Maintains consistency and type safety across the API.

File Handling & Validation

Validates uploaded audio formats against allowed extensions. Uses the job management system for temporary file storage. Implements proper error handling and logging for reliability.

Testing

Added integration test test_diarization_integration.py. Uses FastAPI TestClient to validate endpoint behavior. Dynamically generates a test audio file using FFmpeg. Verifies response structure, including speaker labels, timestamps, and transcription fields. Response Format { "language": "auto", "segments": [ { "speaker": "SPEAKER_0", "start": 0.0, "end": 2.0, "text": "..." } ] } Summary

This update establishes a unified diarization + transcription pipeline, improves API structure, and ensures a standardized response format.

add diarization + ASR backend pipeline integration

Merge request reports