Display Realtime ASR Performance Metrics for Every Recording
Description
The ASR application currently provides realtime transcription output but does not expose processing and quality-related metrics to the user. Add a metrics panel that automatically calculates and displays key ASR performance indicators for each realtime recording.
Metrics to Display
- Processing Time
- RTFx (Real-Time Factor)
- CER (Character Error Rate)
- WER (Word Error Rate)
- Detected Language(s)
- Number of Speakers Detected
RTFx should be calculated as:
RTFx = \frac{\text{Processing Time}}{\text{Audio Duration}}
Expected Behavior
- Metrics should update automatically after or during realtime transcription.
- Processing time should reflect total ASR inference duration.
- RTFx should accurately represent realtime processing efficiency.
- CER and WER should be displayed when reference transcripts are available.
- Detected languages should be listed clearly.
- Speaker count should be obtained from diarization results.
- Metrics should be displayed in a dedicated UI section/card for each recording session.
- UI updates should occur dynamically without requiring a page refresh.
Acceptance Criteria
-
All required metrics are visible for every realtime recording. -
RTFx values are calculated correctly. -
Language detection and speaker count display accurate results. -
CER/WER handling works correctly when ground-truth transcripts exist. -
Metrics UI is responsive and integrated with the existing ASR interface.