feat: implement --dry-run mode and fix filename-to-stem mismatch in extract command
📋 Overview
This MR introduces a Dry Run mode across all primary upload commands and fixes a critical bug in the extracted text upload workflow that previously prevented correct record matching.
The update improves developer safety, usability, and reliability without affecting existing functionality.
🚀 Key Changes
🔹 CLI Enhancements
-
Added
--dry-runflag to:upload-filesextractresume
-
CLI clearly indicates simulation mode:
- Example:
📝 Extracted Text Upload (DRY RUN)
- Example:
-
Ensures users can validate workflows before execution
🔹 Simulation Logic
-
Updated:
upload.pyextracted_text_upload.py
-
Behavior in
--dry-runmode:- Skips all API calls (
POST,PUT) - Prevents any data mutation
- Does NOT update local state files
- Skips all API calls (
-
Added clear simulation logs:
[DRY RUN] Skipping upload for <filename>
-
Ensures:
- No side effects
- Safe testing before real uploads
🐛 Bug Fix: Filename-to-Stem Mismatch
Issue: Extracted text upload failed to match CSV rows with JSON files due to mismatch between:
- Full filenames (with extensions)
- File stems (without extensions)
Fix:
- Standardized comparison using file stems
- Ensures correct mapping between CSV and JSON files
Impact:
- Fixes broken record matching
- Ensates successful extracted text uploads
🧪 How to Verify
✅ 1. Test Dry Run (Upload Files)
uv run corpus-client upload-files ./your-data --dry-run
- Should prompt for metadata
- Should NOT upload files
- Should display dry-run logs
✅ 2. Test Dry Run (Extract)
uv run corpus-client extract ./mapping.csv ./json_dir --dry-run
- Should process files
- Should NOT upload data
- Should show:
[DRY RUN] Skipping upload for <stem>
✅ Expected Output
╭────────────────────────────────────╮
│
## ✅ Acceptance Criteria
* [x] `--dry-run` works across all commands
* [x] No API calls executed in dry run mode
* [x] Local state remains unchanged
* [x] Clear simulation logs displayed
* [x] Extract workflow correctly matches files
* [x] No regression in existing functionality
## 🎯 Result
* Safer bulk operations (pre-validation)
* Improved debugging and developer experience
* Fixed critical upload bug
* More reliable data processing pipeline