Skip to content

feat: replace is_fully_proofread with proofread_status in extracted text

Banuri Koushik Reddy requested to merge feat/proofread-status-enum into develop

Title feat(records): implement granular proofread status with weighted random prioritization

Description

This MR transitions the extracted text proofreading tracking from a binary boolean (is_fully_proofread) to a granular three-state enumeration (proofread_status). This allows the system to distinguish between "Not Started," "In Progress," and "Completed" states.

Key Changes

  • Data Model Evolution:

    • Introduced ProofreadStatus enum: no (0% complete), partial (1-99% complete), and yes (100% complete).
    • Replaced the is_fully_proofread boolean with the proofread_status field.
    • Added a database migration with a backfill script that automatically calculates the status for existing records based on their segment-level proofread flags.
  • Dynamic Status Calculation:

    • Updated the update_extracted_text endpoint to automatically recalculate and persist the proofread_status whenever a user submits corrections or marks segments as proofread.
  • Advanced Prioritization Logic (Weighted Random):

    • Modified the /records/for-review logic to treat partial and no records as equally eligible for review (Priority Level 0).
    • Implemented Weighted Randomization (0.5 multiplier for partial records).
    • The Result: A balanced "Combination" queue. partial work is prioritized to appear sooner (approx. 2:1 ratio), but it no longer "blocks" new no records from appearing. This ensures a mixed review sequence like no, partial, no.
  • Filtering & API Response:

    • Added proofread_status filtering to the record review endpoint.
    • Included proofread_status in all extracted text API response schemas.
  • Testing & Quality:

    • Added new unit tests to verify the SQL priority mapping and ensure partial/no are grouped correctly.
    • Verified that the weighted random logic prevents queue stagnation.
    • Full compliance with ruff linting and formatting standards.

Checklist

  • Feature has been implemented.
  • Tests are added/updated.
  • Existing functionality impact has been checked.

Related Issue(s)

Closes #131

Edited by Banuri Koushik Reddy

Merge request reports

Loading