Punctuation Model Loading & README Update for Punctuation Pipeline
Description
The current punctuation pipeline lacks support for configurable model precision and does not efficiently manage model loading across different configurations. This leads to unnecessary reloads, suboptimal performance, and limited flexibility in balancing inference speed and accuracy.
Additionally, the existing documentation does not reflect recent changes in model source and pipeline behavior, resulting in potential inconsistencies for developers.
Proposed Solution
-
Introduce precision-aware model loading:
- FP16 for high-precision inference on supported hardware (e.g., WebGPU)
- INT8 as the default for faster and lightweight execution
-
Update model source to
therajasekhar/punctuate-indic-v1-ONNX -
Implement state management to:
- Prevent redundant model reloads
- Reuse in-progress loading requests
-
Reload models only when precision configuration changes
-
Ensure all punctuation requests use the correct model configuration
-
Update README to reflect:
- New model source
- Precision handling logic
- Cleaned and relevant documentation
Expected Outcome
- Improved performance and reduced resource usage
- Enhanced accuracy when high precision is enabled
- Stable handling of concurrent requests
- Elimination of unnecessary model reloads
- Clear and up-to-date documentation
Acceptance Criteria
- Models load based on requested precision mode
- No duplicate loads for the same configuration
- Model reload occurs only on precision change
- Concurrent requests are handled safely without race conditions
- README is updated and accurately reflects the implementation