feat: Merge Request for Phase 6: Benchmarking (!25) · Merge requests · VISWAM / apps / Speech / Voice App Backend · GitLab

vyshnavi requested to merge benchmarking-asr into develop Apr 22, 2026

Implementation:

Implemented benchmarking engine to process audio files and compute evaluation metrics.
Integrated with model-router to run inference on each audio sample.
Implemented asynchronous job workflow: POST /benchmark → initiates benchmark job and returns job_id GET /benchmark/{job_id} → retrieves benchmark results
Added support for benchmarking multiple models independently.
Generated structured JSON reports containing evaluation metrics.

Metrics Computed

Accuracy Metrics:

Average WER
Average CER
SER
Error Breakdowns: Substitutions, Insertions, Deletions
Includes per-sample evaluation: Ground truth vs prediction File-level WER/CER S/I/D breakdown

Latency Metrics

Average Latency
std latency
Min/Max Latency
p50
Average RTF
Min/ Max RTF

Throughput Metrics

avg inference sec
throughput files per sec
Audio processed per second
Average RTF
Estimated cost per minute

Scalability Metrics

Measured under increasing concurrent users: 10, 50, 100 (Throughput, Latency)

Concurrency Metrics

Requests Tested
Avg latency, Min/Max latency
throughput
Inference Time
Audio Duration

Testing

Added integration tests using a small fixture dataset.
Verified correctness of metrics calculation
Tested system behavior under: sequential execution concurrent load

*closes #18 (closed)

Edited Apr 28, 2026 by vyshnavi