Skip to content

refactor(record): move extracted_text JSONB to dedicated table

Kushal Lagichetty requested to merge feat/extracted-table into develop

Title (semantic)

refactor(record): move extracted_text JSONB to dedicated table

Description

This merge request refactors the storage of AI-extracted text (OCR/ASR/Captioning) from a flexible JSONB blob in the records table to a structured extractedtext table.

Key Changes:

  • Dedicated Table: Created ExtractedText model with typed columns for transcription metadata.
  • Relationship Mapping: Replaced the Record.extracted_text JSONB column with a 1-to-1 relationship to the new table.
  • SQLAlchemy Conflict Resolution: Renamed internal field to extraction_metadata to avoid a conflict with SQLAlchemy's reserved metadata property.
  • Backward Compatibility: Implemented a Pydantic validator in RecordRead to ensure the API still outputs the key "metadata" in responses.
  • Data Migration: Included a lossless Alembic migration script to create the table and transfer all existing JSONB data.

Checklist

  • Code has been refactored for clarity, maintainability, or performance.
  • No functional changes have been introduced.
  • All existing tests are passing.
  • Code adheres to project coding standards.

Related Issue(s)

Closes #110 (closed)

Edited by Kushal Lagichetty

Merge request reports

Loading