feat: Benchmarking for transcription (!33) · Merge requests · VISWAM / apps / Speech / Voice App Backend · GitLab

vyshnavi requested to merge asr-benchmarking into develop May 07, 2026

Implementation:

Implemented benchmarking engine to process audio files and compute evaluation metrics.
Integrated with model-router to run inference on each audio sample.
Implemented asynchronous job workflow: POST /benchmark → initiates benchmark job and returns job_id GET /benchmark/{job_id} → retrieves benchmark results
Added support for benchmarking multiple models independently.
Generated structured JSON reports containing evaluation metrics.

Metrics Computed

Accuracy Metrics:

Average WER
Average CER
SER
Error Breakdowns: Substitutions, Insertions, Deletions
Includes per-sample evaluation: Ground truth vs prediction File-level WER/CER S/I/D breakdown

Latency Metrics

Average Latency
std latency
Min/Max Latency
p50
Average RTF
Min/ Max RTF

Throughput Metrics

avg inference sec
throughput files per sec
Audio processed per second
Average RTF
Estimated cost per minute

Scalability Metrics

Measured under increasing concurrent users: 10, 50, 100 (Throughput, Latency)

Testing

Verified correctness of metrics calculation
Tested system behavior under: sequential execution concurrent load

*closes #18 (closed)