Skip to content

feat(records): extract and store file metadata on upload

Summary

  • Adds file_metadata (JSONB, nullable) field to Record model
  • Extracts technical metadata at upload finalize time based on media type:
    • Audio/Video: ffprobe — format, codec, bitrate, sample rate, duration, video dimensions
    • Image: Pillow — format, mode, dimensions, EXIF data
    • Document: pypdf — page count, author, title; python-docx — word count, paragraph count
  • Uses MIME type detection (python-magic) instead of file extension check to handle extensionless temp file paths
  • Exposes file_metadata in RecordRead response schema
  • Adds Alembic migration c3d4e5f6a7b8
  • Adds Pillow, mutagen, pypdf, python-docx to dependencies

Test plan

  • Migration c3d4e5f6a7b8 applies cleanly
  • Upload audio file → file_metadata contains format, codec, sample rate, bitrate, duration
  • Upload image file → file_metadata contains format, mode, width, height, EXIF data
  • Upload PDF file → file_metadata contains page count, author, creator, producer

Checklist

  • Code follows project API guidelines
  • Documentation is updated (OpenAPI docs reflect new field)
  • Code adheres to project coding standards

Closes #66

Merge request reports

Loading