refactor(record): move extracted_text JSONB to dedicated table
Title (semantic)
refactor(record): move extracted_text JSONB to dedicated table
Description
This merge request refactors the storage of AI-extracted text (OCR/ASR/Captioning) from a flexible JSONB blob in the records table to a structured extractedtext table.
Key Changes:
- Dedicated Table: Created ExtractedText model with typed columns for transcription metadata.
- Relationship Mapping: Replaced the Record.extracted_text JSONB column with a 1-to-1 relationship to the new table.
- SQLAlchemy Conflict Resolution: Renamed internal field to extraction_metadata to avoid a conflict with SQLAlchemy's reserved metadata property.
- Backward Compatibility: Implemented a Pydantic validator in RecordRead to ensure the API still outputs the key "metadata" in responses.
- Data Migration: Included a lossless Alembic migration script to create the table and transfer all existing JSONB data.
Checklist
-
Code has been refactored for clarity, maintainability, or performance. -
No functional changes have been introduced. -
All existing tests are passing. -
Code adheres to project coding standards.
Related Issue(s)
Closes #110 (closed)
Edited by Kushal Lagichetty