feat: real-time metrics for the asr
Overview
This merge request introduces realtime ASR analytics and performance metrics display for every recording session in the ASR application. The feature enhances visibility into transcription quality, processing efficiency, language detection, and speaker diarization results by presenting key metrics directly within the UI.
Changes Implemented
Added Realtime Metrics Panel
Implemented a dedicated metrics section in the ASR interface to display realtime processing and transcription statistics for each recording session.
Metrics Included
- Processing Time
- RTFx (Real-Time Factor)
- CER (Character Error Rate)
- WER (Word Error Rate)
- Detected Language(s)
- Number of Speakers Detected
RTFx Calculation
RTFx is calculated using:
RTFx = \frac{\text{Processing Time}}{\text{Audio Duration}}
Where:
- Processing Time = Total ASR inference time
- Audio Duration = Length of recorded audio
This metric indicates whether the ASR pipeline is processing faster or slower than realtime.
Functional Enhancements
Processing Metrics
- Added inference start/end time tracking
- Implemented automatic processing time calculation
- Added realtime RTFx computation
Transcription Quality Metrics
- Added CER calculation support
- Added WER calculation support
- Conditional rendering when reference transcript is available
Language Detection
- Added support for displaying detected language(s) from ASR output
Speaker Diarization
- Integrated speaker count display from diarization pipeline
UI Improvements
- Added responsive metrics card/component
- Enabled dynamic metric updates without page refresh
- Improved visibility and organization of ASR statistics
- Added graceful handling for unavailable metrics
Expected Behavior
After each realtime recording:
- Audio is processed by the ASR pipeline
- Metrics are computed automatically
- Results are displayed in the metrics panel
- UI updates dynamically in realtime
Acceptance Criteria
- Metrics are displayed for every realtime recording session
- RTFx values are calculated correctly
- CER/WER are shown when reference transcripts are available
- Detected language(s) are displayed accurately
- Speaker count matches diarization output
- Metrics update dynamically without manual refresh
- UI remains responsive and visually consistent
Testing Performed
- Verified processing time calculation
- Verified RTFx computation accuracy
- Tested dynamic UI updates
- Verified language detection rendering
- Verified speaker count display
- Tested CER/WER conditional rendering behavior
- Validated metrics panel responsiveness across recording sessions
closes #22 (closed)
Edited by srilatha bandari