Skip to content

feat: Merge Request for Phase 6: Benchmarking

vyshnavi requested to merge benchmarking-asr into develop

Implementation:

  • Implemented benchmarking engine to process audio files and compute evaluation metrics.
  • Integrated with model-router to run inference on each audio sample.
  • Implemented asynchronous job workflow: POST /benchmark → initiates benchmark job and returns job_id GET /benchmark/{job_id} → retrieves benchmark results
  • Added support for benchmarking multiple models independently.
  • Generated structured JSON reports containing evaluation metrics.

Metrics Computed

  1. Accuracy Metrics:
  • Average WER
  • Average CER
  • SER
  • Error Breakdowns: Substitutions, Insertions, Deletions
  • Includes per-sample evaluation: Ground truth vs prediction File-level WER/CER S/I/D breakdown
  1. Latency Metrics
  • Average Latency
  • std latency
  • Min/Max Latency
  • p50
  • Average RTF
  • Min/ Max RTF
  1. Throughput Metrics
  • avg inference sec
  • throughput files per sec
  • Audio processed per second
  • Average RTF
  • Estimated cost per minute
  1. Scalability Metrics
  • Measured under increasing concurrent users: 10, 50, 100 (Throughput, Latency)
  1. Concurrency Metrics
  • Requests Tested
  • Avg latency, Min/Max latency
  • throughput
  • Inference Time
  • Audio Duration

Testing

  • Added integration tests using a small fixture dataset.
  • Verified correctness of metrics calculation
  • Tested system behavior under: sequential execution concurrent load

*closes #18 (closed)

Edited by vyshnavi

Merge request reports

Loading