Refactor Speaker Diarization into Transcription Endpoint with Flag (Remove Redundant Endpoints & Code Duplication)

Description

Currently, speaker diarization is handled separately from the transcription pipeline, resulting in duplicated logic across multiple endpoints and services. This leads to increased maintenance overhead and inconsistencies in how transcription and diarization are combined.

This issue proposes consolidating speaker diarization into the existing /transcribe endpoint using a configurable flag (enable_diarization). The goal is to produce a unified, speaker-aware transcription output with precise timestamps, while eliminating redundant code paths and legacy endpoints.

Problem

Duplicate logic for diarization and transcription across different modules
Separate endpoints for diarization create fragmentation
Inconsistent handling of timestamps and speaker alignment
Harder to maintain and extend the pipeline
Increased technical debt due to legacy wrappers and unused code

Proposed Solution

Introduce a flag-based approach:
```
{
  "enable_diarization": true
}
```
Integrate diarization into the /transcribe pipeline:
- Run diarization → get speaker segments
- Segment audio
- Run ASR per segment
- Align and merge results into a conversation-style output
Return structured response:
- Speaker-wise grouped transcription
- Accurate start/end timestamps
- Optional segment-level detail (for future extensibility)

Refactoring Tasks

Remove or deprecate standalone diarization endpoints
Consolidate logic into transcribe pipeline
Ensure single source of truth for:
- audio segmentation
- ASR invocation
- diarization alignment
Eliminate duplicate helper functions across modules
Clean up legacy wrappers if no longer required
Standardize response format using shared models
Add proper logging for diarization-enabled flows

Expected Outcome

Cleaner and more maintainable codebase
No duplication of diarization/transcription logic
Single unified pipeline for all transcription use cases
Easier extensibility (e.g., subtitles, analytics, summarization)
Improved consistency in API responses

Acceptance Criteria

/transcribe supports enable_diarization flag
Output includes speaker-wise transcription with timestamps
No duplicate diarization logic exists elsewhere
Legacy diarization endpoints are removed or deprecated
Existing functionality (non-diarization transcription) remains unaffected

Notes

This change aligns with a pipeline-based architecture and improves long-term scalability of the ASR + diarization system.