feat: add --dry-run mode for upload workflows
📋 Summary
This MR introduces a --dry-run mode across core upload workflows in the Corpus CLI, enabling users to simulate operations without performing any API mutations.
The feature ensures that users can safely validate file inputs, metadata, and processing logic before executing real uploads—particularly useful for large-scale bulk operations.
❗ Problem
Currently, all upload commands directly perform API operations.
Observed Issues
-
No way to validate uploads before execution
-
Users risk:
- Partial uploads
- Failed long-running jobs
- Incorrect metadata submissions
-
Requires manual recovery using
resumeafter failure -
No safe “preview” mode for bulk operations
This is especially problematic for:
- Large datasets
- Unstable network conditions
- First-time users configuring uploads
🎯 Scope of This MR
This MR introduces dry-run support for existing workflows:
upload_filesextractresume
It focuses only on simulation and validation, without modifying core upload logic.
🚀 Changes Made
1. Added Dry Run Flag
CLI Updates
-
Introduced
--dry-runflag across:- Upload command
- Extract command
- Resume command
Behavior
When enabled:
-
Executes:
- File discovery
- Metadata validation
- Processing pipeline logic
-
Skips:
- API requests (
POST,PUT) - Data mutations
- State updates
- API requests (
2. Upload Flow Updates
Updated:
src/corpus_client_cli/upload.py
Changes
-
Wrapped API interaction logic with conditional checks:
if not dry_run: # perform API call else: print("[DRY RUN] Skipping upload") -
Ensures:
- Full validation still runs
- No external side effects occur
Per-file Behavior
-
Each file is:
- Parsed
- Validated
- Simulated
-
Output clearly indicates skipped uploads:
[DRY RUN] Skipping upload for file1.mp3
3. Extracted Text Upload Updates
Updated:
src/corpus_client_cli/extracted_text_upload.py
Changes
-
Applied dry-run logic to extracted-text workflow:
- CSV parsing
- JSON matching
- Validation
-
API calls are skipped during simulation
Output Example
[DRY RUN] Skipping upload for transcription1
4. Resume Flow Support
-
Extended dry-run behavior to resume command
-
Ensures:
- Previously incomplete uploads can be safely simulated
- No state is modified
5. Logging & UX Improvements
- Added clear console indicators:
📝 Upload (DRY RUN)
- Consistent log format across all workflows:
[DRY RUN] Skipping upload for <filename>
-
Provides user clarity that:
- Operation is simulated
- No real upload is happening
6. State Safety
-
Ensured dry-run does not update local state files:
- No marking files as completed
- No mutation of progress tracking
This allows:
- Immediate transition to real execution after dry-run
- No unintended side effects
🧪 Testing
Manual Verification
Bulk Upload
corpus-client upload "path/to/files" --dry-run
- Files scanned
- Validation runs
- No uploads performed
Extracted Text Upload
corpus-client extract "mapping.csv" "json_dir" --dry-run
- Mapping processed
- JSON matched
- No API calls
Example Output
╭──────────────────────────────╮
│ 📁 Record Upload (DRY RUN) │
╰──────────────────────────────╯
🔍 Scanning files...
📤 Starting upload...
[DRY RUN] Skipping upload for file1.mp3
[DRY RUN] Skipping upload for file2.mp3
✅ Dry Run Complete!
🎯 User-Facing Impact
After this MR:
Users can:
- Validate uploads before execution
- Detect issues early (missing files, bad mappings)
- Avoid failed long-running operations
- Use CLI in a safe, preview mode
🧠 Why This Approach
This solution:
- Reuses existing workflow logic
- Adds minimal conditional branching
- Avoids duplication of upload logic
- Keeps implementation simple and maintainable
Dry-run is implemented as a non-invasive layer, ensuring:
- No regression in current behavior
- Easy extension for future features
⚠ ️ Out of Scope
This MR does NOT:
- Add retry logic
- Modify backend APIs
- Change upload algorithms
- Introduce new endpoints
- Alter data processing logic
📁 Files Changed
src/corpus_client_cli/upload.pysrc/corpus_client_cli/extracted_text_upload.pysrc/corpus_client_cli/cli.py
🔍 Suggested Reviewer Focus
- Correct placement of dry-run condition checks
- Ensuring no API calls are triggered accidentally
- Log clarity and consistency
- State file integrity during simulation
⚠ ️ Risk Assessment
Risk: Low
Reasons:
- No changes to core logic
- Only conditional skipping of API calls
- No backend impact
- Safe fallback behavior
✅ Expected Outcome
Users can run upload commands in dry-run mode and:
- See exactly what would happen
- Validate inputs and workflows
- Proceed confidently with real uploads
🏁 Final Note
This aligns the CLI with standard tooling practices by introducing a safe simulation mode, improving reliability and user experience for bulk data operations.