feat: Pattern-Based File Upload Filtering in Corpus CLI
Problem Statement
Users uploading files through the Corpus CLI were unable to filter or select specific file types when uploading from a directory. The system either:
- Uploaded all files without discrimination, or
- Required users to upload files one at a time
This resulted in inefficiency, especially when directories contained mixed file types (e.g., uploading only .mp3 files from a folder with audio, video, and images).
Solution
Implemented glob pattern support in the upload-files command, enabling users to filter files using standard pattern matching syntax.
Features Added
-
Pattern-Based Filtering
-
Supports patterns like:
-
*.mp3→ Upload only MP3 files -
*.mp4→ Upload only MP4 files -
*.jpg,*.jpeg→ Upload JPEG images -
*.csv→ Upload CSV files -
**/*.wav→ Recursively upload WAV files -
subdir/*.mp3→ Upload from specific folder
-
-
-
Automatic Pattern Normalization
-
Converts user-friendly inputs:
-
.mp3→*.mp3
-
-
-
Universal File Support
-
Works with:
- Audio, Video, Images
- Documents (PDF, CSV, TXT, DOCX, etc.)
- Any custom file extensions
-
-
Recursive Directory Support
- Supports deep directory traversal using
**/
- Supports deep directory traversal using
-
Backward Compatibility
-
Default behavior unchanged:
- Empty or
*→ Upload all files
- Empty or
-
🛠 Implementation Details
Modified Files:
-
upload.py→ Added glob logic inrun_record_upload() -
cli.py→ Added--patternparameter
Key Enhancements:
-
Pattern normalization utility
-
File matching using
pathlib.Path.glob() -
CLI prompt with usage examples
-
Console feedback:
Found X files matching pattern 'Y'
📖 Usage Examples
-
Upload MP3 files:
upload-files --pattern "*.mp3" -
Upload JPEG images:
upload-files --pattern "*.jpg" -
Recursive upload:
upload-files --pattern "**/*.mp4" -
Upload all files:
upload-files
🧪 Testing
-
✅ 7+ test cases covering:- Extension filters (
.mp3) - Wildcards (
*.mp3) - Recursive patterns (
**/*.wav) - Subdirectory patterns (
subdir/*.mp3) - Default behavior
- Extension filters (
-
✅ Real-world validation:- Multiple file types
- Files with spaces
- Concurrent uploads
Benefits
- Efficient batch uploads
- Improved user experience
- Flexible and powerful filtering
- Works across all file types
- Fully backward compatible

