fix: Shift Model Loading from Lazy Loading to Preloading with Warm-Up (!34) · Merge requests · VISWAM / apps / Speech / Voice App Backend

ashritha kunjeti requested to merge model-loading into diarization May 08, 2026

Summary

This MR updates the model initialization strategy by replacing lazy loading with eager preloading at application startup, along with a dummy inference (warm-up pass) to ensure all models are fully initialized before handling real requests.

Changes Made

Moved model loading from on-demand (lazy) to startup phase
Implemented centralized model loader
Added warm-up step using dummy input
Ensures all pipelines are initialized:
- ASR (Swecha + Whisper)
- Language Identification (LID)
- Diarization
- Punctuation
- Alignment
Updated logs to reflect loading and warm-up stages
Ensured models are cached and reused across requests

Motivation

Previously:

First request experienced high latency (cold start)
Models loaded during request → poor user experience
Risk of race conditions under concurrent requests

Now:

Models are ready before serving traffic
Eliminates cold-start delays
Improves consistency and reliability

Impact

Improvements

Faster response time for first request
Consistent latency across all requests
Early detection of model loading failures
Better production readiness

Trade-offs

Increased application startup time (~100s observed)
Higher initial CPU/GPU usage during boot
Slightly heavier memory footprint at idle

Performance Observations

Model loading time: ~100 seconds
Warm-up time: ~14 seconds
Runtime latency improved by eliminating cold start delays

*closes #22 (closed)

Edited May 08, 2026 by ashritha kunjeti

fix: Shift Model Loading from Lazy Loading to Preloading with Warm-Up