Benchmarking separate models in isolation

Issue Description

Currently, there is no structured benchmarking mechanism to evaluate the performance of the individual models used in the ASR pipeline. This makes it difficult to analyze inference efficiency, memory utilization, and scalability characteristics of each component independently.

The project uses multiple models for transcription, diarization, punctuation restoration, and language recognition. Since these models operate together within the pipeline, it is important to isolate and benchmark them separately to identify performance bottlenecks and understand resource requirements.

Models to Benchmark

Transcription Models:
- swecha_gonthuka
- distil-whisper/distil-large-v3
Speaker Diarization:
- pyannote/speaker-diarization-3.1
Punctuation Restoration:
- ModelsLab/punctuate-indic-v1
Language Recognition:
- openai/whisper-small

Required Benchmarking Metrics

Model Load Time
Inference / Transcription Time
RAM Usage / Memory Consumption

Benchmark Dataset Requirements

Benchmarking should be performed using:

Telugu audio samples
English audio samples

Audio durations:

30 seconds
60 seconds
1 minute

Expected Outcome

Measure and compare performance of each model independently
Identify latency and memory bottlenecks
Evaluate scalability for different audio durations
Provide performance insights for optimization and deployment planning