feat(records): extract and store file metadata on upload
Summary
- Adds
file_metadata(JSONB, nullable) field toRecordmodel - Extracts technical metadata at upload finalize time based on media type:
- Audio/Video: ffprobe — format, codec, bitrate, sample rate, duration, video dimensions
- Image: Pillow — format, mode, dimensions, EXIF data
- Document: pypdf — page count, author, title; python-docx — word count, paragraph count
- Uses MIME type detection (python-magic) instead of file extension check to handle extensionless temp file paths
- Exposes
file_metadatainRecordReadresponse schema - Adds Alembic migration
c3d4e5f6a7b8 - Adds Pillow, mutagen, pypdf, python-docx to dependencies
Test plan
-
Migration c3d4e5f6a7b8applies cleanly -
Upload audio file → file_metadatacontains format, codec, sample rate, bitrate, duration -
Upload image file → file_metadatacontains format, mode, width, height, EXIF data -
Upload PDF file → file_metadatacontains page count, author, creator, producer
Checklist
-
Code follows project API guidelines -
Documentation is updated (OpenAPI docs reflect new field) -
Code adheres to project coding standards
Closes #66