Skip to content

add diarization + ASR backend pipeline integration

ashritha kunjeti requested to merge speaker-diarization-3 into speaker-diarization

Description

This Merge Request introduces a new integrated API endpoint that combines speaker diarization and automatic speech recognition (ASR) into a single, cohesive pipeline. The implementation enables processing of uploaded audio files to produce speaker-labeled transcriptions in a structured JSON format.

Key Changes

  1. New API Endpoint

Added POST /diarize-transcribe in app/main.py. Accepts audio file uploads with an optional language selection parameter. Supports automatic language handling when “other” is selected.

  1. Pipeline Integration

Orchestrates the complete workflow: Performs speaker diarization to segment audio by speaker. Applies ASR transcription on each segmented portion. Merges results into a unified response. Ensures smooth interaction between diarization and transcription modules.

  1. Data Models

Introduced new response models in app/models/shared_models.py: DiarizedSegment: Represents individual segments with speaker label, timestamps, and transcription. DiarizationResponse: Defines the overall structured response containing language and segment list. Maintains consistency and type safety across the API.

  1. File Handling & Validation

Validates uploaded audio formats against allowed extensions. Uses the job management system for temporary file storage. Implements proper error handling and logging for reliability.

  1. Testing

Added integration test test_diarization_integration.py. Uses FastAPI TestClient to validate endpoint behavior. Dynamically generates a test audio file using FFmpeg. Verifies response structure, including speaker labels, timestamps, and transcription fields. Response Format { "language": "auto", "segments": [ { "speaker": "SPEAKER_0", "start": 0.0, "end": 2.0, "text": "..." } ] } Summary

This update establishes a unified diarization + transcription pipeline, improves API structure, and ensures a standardized response format.

Merge request reports

Loading