Skip to content

feat(records): Implement is_extracted Boolean Filter for Targeted Record Review

Kushal Lagichetty requested to merge feat/records-is-extracted-filter into develop

Summary

This MR introduces a highly-requested filtering capability to the POST /api/v1/records/for-review endpoint. It allows reviewers to specifically target records based on their extraction status, bridging the gap between automated AI processing and manual transcription workflows.

Closes #(issue number)


Context

Previously, reviewers had to sift through records without knowing whether they were validating AI-generated results (OCR/ASR) or starting a fresh manual transcription. This filter provides the necessary granularity to prioritize validation vs. creation tasks.


Technical Implementation

Boolean Filter Logic

Value Strategy Behavior
is_extracted: true INNER JOIN with extractedtext Returns records processed by AI models (OCR, ASR), excluding those marked as manual
is_extracted: false LEFT JOIN with extractedtext Returns records with no extraction entry, plus manual entries where segments is not yet populated

Security & Compliance

Raw SQL construction was refactored to use static literals for JOIN types, ensuring compliance with Bandit B608 (SQL injection prevention) by avoiding string interpolation for SQL keywords.

Performance

Integrated into the existing optimized raw SQL path to maintain low latency during random record selection.


Documentation

Updated the endpoint's docstring with parameter definitions and concrete examples for both true and false states.


Checklist

  • Feature fully implemented.
  • Tests for the new feature included and passing.
  • User documentation/guides updated where applicable.
  • Impact on existing functionality considered.

Merge request reports

Loading