Benchmarking separate models in isolation

Issue Description

Currently, there is no structured benchmarking mechanism to evaluate the performance of the individual models used in the ASR pipeline. This makes it difficult to analyze inference efficiency, memory utilization, and scalability characteristics of each component independently.

The project uses multiple models for transcription, diarization, punctuation restoration, and language recognition. Since these models operate together within the pipeline, it is important to isolate and benchmark them separately to identify performance bottlenecks and understand resource requirements.

Models to Benchmark

  • Transcription Models:

    • swecha_gonthuka
    • distil-whisper/distil-large-v3
  • Speaker Diarization:

    • pyannote/speaker-diarization-3.1
  • Punctuation Restoration:

    • ModelsLab/punctuate-indic-v1
  • Language Recognition:

    • openai/whisper-small

Required Benchmarking Metrics

  • Model Load Time
  • Inference / Transcription Time
  • RAM Usage / Memory Consumption

Benchmark Dataset Requirements

Benchmarking should be performed using:

  • Telugu audio samples
  • English audio samples

Audio durations:

  • 30 seconds
  • 60 seconds
  • 1 minute

Expected Outcome

  • Measure and compare performance of each model independently
  • Identify latency and memory bottlenecks
  • Evaluate scalability for different audio durations
  • Provide performance insights for optimization and deployment planning