Skip to content

feat: real-time metrics for the asr

srilatha bandari requested to merge feat/load into develop

Overview

This merge request introduces realtime ASR analytics and performance metrics display for every recording session in the ASR application. The feature enhances visibility into transcription quality, processing efficiency, language detection, and speaker diarization results by presenting key metrics directly within the UI.


Changes Implemented

Added Realtime Metrics Panel

Implemented a dedicated metrics section in the ASR interface to display realtime processing and transcription statistics for each recording session.

Metrics Included

  • Processing Time
  • RTFx (Real-Time Factor)
  • CER (Character Error Rate)
  • WER (Word Error Rate)
  • Detected Language(s)
  • Number of Speakers Detected

RTFx Calculation

RTFx is calculated using:

RTFx = \frac{\text{Processing Time}}{\text{Audio Duration}}

Where:

  • Processing Time = Total ASR inference time
  • Audio Duration = Length of recorded audio

This metric indicates whether the ASR pipeline is processing faster or slower than realtime.


Functional Enhancements

Processing Metrics

  • Added inference start/end time tracking
  • Implemented automatic processing time calculation
  • Added realtime RTFx computation

Transcription Quality Metrics

  • Added CER calculation support
  • Added WER calculation support
  • Conditional rendering when reference transcript is available

Language Detection

  • Added support for displaying detected language(s) from ASR output

Speaker Diarization

  • Integrated speaker count display from diarization pipeline

UI Improvements

  • Added responsive metrics card/component
  • Enabled dynamic metric updates without page refresh
  • Improved visibility and organization of ASR statistics
  • Added graceful handling for unavailable metrics

Expected Behavior

After each realtime recording:

  1. Audio is processed by the ASR pipeline
  2. Metrics are computed automatically
  3. Results are displayed in the metrics panel
  4. UI updates dynamically in realtime

Acceptance Criteria

  • Metrics are displayed for every realtime recording session
  • RTFx values are calculated correctly
  • CER/WER are shown when reference transcripts are available
  • Detected language(s) are displayed accurately
  • Speaker count matches diarization output
  • Metrics update dynamically without manual refresh
  • UI remains responsive and visually consistent

Testing Performed

  • Verified processing time calculation
  • Verified RTFx computation accuracy
  • Tested dynamic UI updates
  • Verified language detection rendering
  • Verified speaker count display
  • Tested CER/WER conditional rendering behavior
  • Validated metrics panel responsiveness across recording sessions

closes #22 (closed)

Edited by srilatha bandari

Merge request reports

Loading