fix: detect media type per file in bulk upload
Problem
Bulk upload prompted the user for a single media type once per batch and applied it to all files, regardless of their actual type. A PDF, MP3, and JPG uploaded together would all receive the same media type tag.
Solution
- Added
detect_media_type()function that uses Python's built-inmimetypesmodule to auto-detect media type from each file's extension - Maps MIME types to the 5 valid media types:
text,audio,video,document,image - Falls back to user-selected default when detection fails (unknown extension, no extension)
- Updated the UI prompt to clarify it's now a "Default Media Type" used only as a fallback
- Added 20 tests covering various file extensions and edge cases
Files Changed
-
src/corpus_client_cli/upload.py- Core detection logic and integration -
tests/test_upload.py- New test file with 20 test cases
Testing
All pre-commit hooks pass: ruff, ruff format, bandit, vulture, mypy, pytest
Closes #2 (closed)
Edited by Ahlad Pataparla