Benchmarking separate models in isolation
Issue Description
Currently, there is no structured benchmarking mechanism to evaluate the performance of the individual models used in the ASR pipeline. This makes it difficult to analyze inference efficiency, memory utilization, and scalability characteristics of each component independently.
The project uses multiple models for transcription, diarization, punctuation restoration, and language recognition. Since these models operate together within the pipeline, it is important to isolate and benchmark them separately to identify performance bottlenecks and understand resource requirements.
Models to Benchmark
-
Transcription Models:
- swecha_gonthuka
- distil-whisper/distil-large-v3
-
Speaker Diarization:
- pyannote/speaker-diarization-3.1
-
Punctuation Restoration:
- ModelsLab/punctuate-indic-v1
-
Language Recognition:
- openai/whisper-small
Required Benchmarking Metrics
- Model Load Time
- Inference / Transcription Time
- RAM Usage / Memory Consumption
Benchmark Dataset Requirements
Benchmarking should be performed using:
- Telugu audio samples
- English audio samples
Audio durations:
- 30 seconds
- 60 seconds
- 1 minute
Expected Outcome
- Measure and compare performance of each model independently
- Identify latency and memory bottlenecks
- Evaluate scalability for different audio durations
- Provide performance insights for optimization and deployment planning