Skip to content

feat(records): add duplicate detection and marking

Summary

  • Adds is_duplicate (bool, indexed) and duplicate_of (UUID FK, self-referential) fields to Record model
  • Detects exact duplicates at upload time via SHA-256 file hash comparison; marks new record with is_duplicate=True and duplicate_of=<original_uid>
  • Adds POST /{record_id}/mark-duplicate and DELETE /{record_id}/mark-duplicate endpoints for admin/reviewer manual override
  • Adds is_duplicate filter to GET /records/
  • Adds Alembic migration b2c3d4e5f6a7

Test plan

  • Migration b2c3d4e5f6a7 applies cleanly
  • Upload original file → "is_duplicate": false, "duplicate_of": null
  • Upload same file again → "is_duplicate": true, "duplicate_of": "<original_uid>"
  • GET /records/?is_duplicate=true → returns only duplicate records
  • GET /records/?is_duplicate=false → returns only non-duplicate records
  • POST /{uid}/mark-duplicate?duplicate_of={same_uid}400 "A record cannot be a duplicate of itself"
  • POST /{uid}/mark-duplicate?duplicate_of={other_uid}"is_duplicate": true
  • DELETE /{uid}/mark-duplicate"is_duplicate": false, "duplicate_of": null

Checklist

  • Code follows project API guidelines
  • Documentation is updated (OpenAPI docs reflect new fields and endpoints)
  • Code adheres to project coding standards

Closes #50

Merge request reports

Loading