feat(records): Implement is_extracted Boolean Filter for Targeted Record Review
Summary
This MR introduces a highly-requested filtering capability to the POST /api/v1/records/for-review endpoint. It allows reviewers to specifically target records based on their extraction status, bridging the gap between automated AI processing and manual transcription workflows.
Closes #(issue number)
Context
Previously, reviewers had to sift through records without knowing whether they were validating AI-generated results (OCR/ASR) or starting a fresh manual transcription. This filter provides the necessary granularity to prioritize validation vs. creation tasks.
Technical Implementation
Boolean Filter Logic
| Value | Strategy | Behavior |
|---|---|---|
is_extracted: true |
INNER JOIN with extractedtext
|
Returns records processed by AI models (OCR, ASR), excluding those marked as manual
|
is_extracted: false |
LEFT JOIN with extractedtext
|
Returns records with no extraction entry, plus manual entries where segments is not yet populated |
Security & Compliance
Raw SQL construction was refactored to use static literals for JOIN types, ensuring compliance with Bandit B608 (SQL injection prevention) by avoiding string interpolation for SQL keywords.
Performance
Integrated into the existing optimized raw SQL path to maintain low latency during random record selection.
Documentation
Updated the endpoint's docstring with parameter definitions and concrete examples for both true and false states.
Checklist
-
Feature fully implemented. -
Tests for the new feature included and passing. -
User documentation/guides updated where applicable. -
Impact on existing functionality considered.