Punctuation Model Loading & README Update for Punctuation Pipeline

Description

The current punctuation pipeline lacks support for configurable model precision and does not efficiently manage model loading across different configurations. This leads to unnecessary reloads, suboptimal performance, and limited flexibility in balancing inference speed and accuracy.

Additionally, the existing documentation does not reflect recent changes in model source and pipeline behavior, resulting in potential inconsistencies for developers.

Proposed Solution

Introduce precision-aware model loading:
- FP16 for high-precision inference on supported hardware (e.g., WebGPU)
- INT8 as the default for faster and lightweight execution
Update model source to therajasekhar/punctuate-indic-v1-ONNX
Implement state management to:
- Prevent redundant model reloads
- Reuse in-progress loading requests
Reload models only when precision configuration changes
Ensure all punctuation requests use the correct model configuration
Update README to reflect:
- New model source
- Precision handling logic
- Cleaned and relevant documentation

Expected Outcome

Improved performance and reduced resource usage
Enhanced accuracy when high precision is enabled
Stable handling of concurrent requests
Elimination of unnecessary model reloads
Clear and up-to-date documentation

Acceptance Criteria

Models load based on requested precision mode
No duplicate loads for the same configuration
Model reload occurs only on precision change
Concurrent requests are handled safely without race conditions
README is updated and accurately reflects the implementation