feat: replace is_fully_proofread with proofread_status in extracted text
Title feat(records): implement granular proofread status with weighted random prioritization
Description
This MR transitions the extracted text proofreading tracking from a binary boolean (is_fully_proofread) to a granular three-state enumeration (proofread_status). This allows the system to distinguish between "Not Started," "In Progress," and "Completed" states.
Key Changes
-
Data Model Evolution:
- Introduced ProofreadStatus enum: no (0% complete), partial (1-99% complete), and yes (100% complete).
- Replaced the is_fully_proofread boolean with the proofread_status field.
- Added a database migration with a backfill script that automatically calculates the status for existing records based on their segment-level proofread flags.
-
Dynamic Status Calculation:
- Updated the update_extracted_text endpoint to automatically recalculate and persist the proofread_status whenever a user submits corrections or marks segments as proofread.
-
Advanced Prioritization Logic (Weighted Random):
- Modified the /records/for-review logic to treat partial and no records as equally eligible for review (Priority Level 0).
- Implemented Weighted Randomization (0.5 multiplier for partial records).
- The Result: A balanced "Combination" queue. partial work is prioritized to appear sooner (approx. 2:1 ratio), but it no longer "blocks" new no records from appearing. This ensures a mixed review sequence like no, partial, no.
-
Filtering & API Response:
- Added proofread_status filtering to the record review endpoint.
- Included proofread_status in all extracted text API response schemas.
-
Testing & Quality:
- Added new unit tests to verify the SQL priority mapping and ensure partial/no are grouped correctly.
- Verified that the weighted random logic prevents queue stagnation.
- Full compliance with ruff linting and formatting standards.
Checklist
-
Feature has been implemented. -
Tests are added/updated. -
Existing functionality impact has been checked.
Related Issue(s)
Closes #131
Edited by Banuri Koushik Reddy