Implementation of Speaker Label Ordering and Convert to 1-Based Indexing
Description
This merge request fixes issues related to speaker diarization output ordering and speaker label formatting. The implementation ensures that speaker segments are displayed in chronological order and that speaker labels follow a clean and consistent 1-based indexing format.
Speaker Segment Ordering
- Added sorting logic to arrange all diarization segments by
start_timein ascending order - Ensured conversation flow is displayed correctly and sequentially
Speaker Label Remapping
- Converted speaker labels from zero-based indexing to one-based indexing
-
SPEAKER_00→SPEAKER_01 -
SPEAKER_01→SPEAKER_02
-
- Implemented sequential speaker numbering without gaps or inconsistencies
Consistent Speaker Mapping
- Added mapping logic to maintain consistency between original speaker IDs and remapped labels
- Updated speaker labels across:
- Segment outputs
-
speaker_durationsdictionary - Response formatting and processing logic
Formatting Improvements
- Standardized speaker label formatting throughout the response pipeline
- Improved readability and structure of diarization results
Validation Performed
- Verified all segments are ordered chronologically
- Confirmed no occurrence of
SPEAKER_00in output - Tested multiple diarization inputs with varying speaker counts
- Ensured speaker labels remain consistent across segments and duration mappings
- Validated overall response readability and logical conversation flow
Outcome
This update improves the clarity, consistency, and usability of speaker diarization outputs, making transcripts easier to read and more suitable for downstream processing and UI integration.
Closes #20 (closed)