Improve Model Loading UX and Add Precision-Aware Punctuation Model Loading with Documentation Updates
Summary
This MR improves the application’s model loading experience and enhances the punctuation pipeline by introducing precision-aware model loading.
For model loading, the application now displays the loading screen only during the first-time model initialization. Once ASR and punctuation models are downloaded and cached, subsequent visits skip the loading screen and directly render the main UI, improving user experience and perceived performance.
For punctuation, the system now supports dynamic precision modes (FP16/INT8) and switches to a new model repository. It also optimizes model loading by preventing redundant reloads and safely handling concurrent requests.
Changes Made
Model Load Persistence
- Added persistent flags (
modelLoaded,punctModelLoaded) usinglocalStorage - Flags are set after successful model initialization
- Ensures model state is retained across refresh and revisit
Conditional UI Rendering
-
Added startup check for model load status
-
Updated rendering flow:
- Show loading screen only on first load
- Skip loading screen if models are already cached
-
Improves perceived performance for returning users
Background Initialization
- Enabled non-blocking/background model initialization
- Prevented unnecessary re-triggering of loading logic
- Ensured smooth UI experience without delays
Precision-Aware Punctuation Loading
-
Switched model source to
therajasekhar/punctuate-indic-v1-ONNX -
Added support for precision-based loading:
-
FP16→ whenhighPrecision = trueand WebGPU is available -
INT8→ default (faster and lightweight)
-
-
Enables flexible trade-off between performance and accuracy
Worker & Service Enhancements
src/ai/punctuateWorker.ts
-
Added
highPrecisionsupport in worker messages -
loadPunctuationModel()now:- Accepts options
- Dynamically selects model format (FP16/INT8)
-
Tracks loaded model via
loadedModelFile -
Clears and reloads pipeline only when precision changes
src/services/punctuateWorkerService.ts
-
Added state tracking:
loadedHighPrecisionpendingHighPrecision
-
Optimized loading:
- Skips reload if same precision is already loaded
- Reuses in-flight load requests
-
Ensures
punctuate(text, highPrecision)always uses correct model
Code Improvements
- Simplified model loading flow
- Reduced redundant UI states and reloads
- Improved separation of concerns (UI vs model logic)
- Added safeguards for concurrent requests
Documentation Update
-
Updated README to:
- Reflect new model source
- Document precision-based loading behavior
- Remove outdated or irrelevant content
-
Ensures clarity and alignment with implementation
Net Effect
- Faster app startup after first load (no repeated loading screen)
- Improved performance with INT8 as default
- Higher accuracy option via FP16 when enabled
- No redundant model reloads
- Stable handling of concurrent requests
- Cleaner and more maintainable architecture
Screenshots
Testing Done
- Verified first-time load shows loading screen
- Verified subsequent refresh skips loading screen
- Confirmed models are not re-downloaded (Network tab)
- Tested precision switching (FP16
↔ INT8) - Verified no duplicate loads during concurrent requests
- Tested after clearing browser cache
- Ensured no UI flicker or regressions
Checklist
-
Model load persistence implemented -
Loading screen shown only on first load -
Main UI rendered directly on subsequent visits -
Precision-based model loading implemented -
No redundant model reloads -
Concurrent request handling optimized -
README updated -
Tested locally -
No known regressions -
Ready for review
Closes #16 (closed) #17 (closed)

