fix: detect media type per file in bulk upload

Problem

Bulk upload prompted the user for a single media type once per batch and applied it to all files, regardless of their actual type. A PDF, MP3, and JPG uploaded together would all receive the same media type tag.

Solution

  • Added detect_media_type() function that uses Python's built-in mimetypes module to auto-detect media type from each file's extension
  • Maps MIME types to the 5 valid media types: text, audio, video, document, image
  • Falls back to user-selected default when detection fails (unknown extension, no extension)
  • Updated the UI prompt to clarify it's now a "Default Media Type" used only as a fallback
  • Added 20 tests covering various file extensions and edge cases

Files Changed

  • src/corpus_client_cli/upload.py - Core detection logic and integration
  • tests/test_upload.py - New test file with 20 test cases

Testing

All pre-commit hooks pass: ruff, ruff format, bandit, vulture, mypy, pytest

Closes #2 (closed)

Edited by Ahlad Pataparla

Merge request reports

Loading