Skip to content

feat: Benchmarking for transcription

vyshnavi requested to merge asr-benchmarking into develop

Implementation:

  • Implemented benchmarking engine to process audio files and compute evaluation metrics.
  • Integrated with model-router to run inference on each audio sample.
  • Implemented asynchronous job workflow: POST /benchmark → initiates benchmark job and returns job_id GET /benchmark/{job_id} → retrieves benchmark results
  • Added support for benchmarking multiple models independently.
  • Generated structured JSON reports containing evaluation metrics.

Metrics Computed

  1. Accuracy Metrics:
  • Average WER
  • Average CER
  • SER
  • Error Breakdowns: Substitutions, Insertions, Deletions
  • Includes per-sample evaluation: Ground truth vs prediction File-level WER/CER S/I/D breakdown
  1. Latency Metrics
  • Average Latency
  • std latency
  • Min/Max Latency
  • p50
  • Average RTF
  • Min/ Max RTF
  1. Throughput Metrics
  • avg inference sec
  • throughput files per sec
  • Audio processed per second
  • Average RTF
  • Estimated cost per minute
  1. Scalability Metrics
  • Measured under increasing concurrent users: 10, 50, 100 (Throughput, Latency)

Testing

  • Verified correctness of metrics calculation
  • Tested system behavior under: sequential execution concurrent load

*closes #18 (closed)

Merge request reports

Loading