Skip to content

feat: implement --dry-run mode and fix filename-to-stem mismatch in extract command

Praveena Veeranki requested to merge feature/fix into develop

📋 Overview

This MR introduces a Dry Run mode across all primary upload commands and fixes a critical bug in the extracted text upload workflow that previously prevented correct record matching.

The update improves developer safety, usability, and reliability without affecting existing functionality.

🚀 Key Changes

🔹 CLI Enhancements

  • Added --dry-run flag to:

    • upload-files
    • extract
    • resume
  • CLI clearly indicates simulation mode:

    • Example: 📝 Extracted Text Upload (DRY RUN)
  • Ensures users can validate workflows before execution

🔹 Simulation Logic

  • Updated:

    • upload.py
    • extracted_text_upload.py
  • Behavior in --dry-run mode:

    • Skips all API calls (POST, PUT)
    • Prevents any data mutation
    • Does NOT update local state files
  • Added clear simulation logs:

[DRY RUN] Skipping upload for <filename>
  • Ensures:

    • No side effects
    • Safe testing before real uploads

🐛 Bug Fix: Filename-to-Stem Mismatch

Issue: Extracted text upload failed to match CSV rows with JSON files due to mismatch between:

  • Full filenames (with extensions)
  • File stems (without extensions)

Fix:

  • Standardized comparison using file stems
  • Ensures correct mapping between CSV and JSON files

Impact:

  • Fixes broken record matching
  • Ensates successful extracted text uploads

🧪 How to Verify

1. Test Dry Run (Upload Files)

uv run corpus-client upload-files ./your-data --dry-run
  • Should prompt for metadata
  • Should NOT upload files
  • Should display dry-run logs

2. Test Dry Run (Extract)

uv run corpus-client extract ./mapping.csv ./json_dir --dry-run
  • Should process files
  • Should NOT upload data
  • Should show:
[DRY RUN] Skipping upload for <stem>

Expected Output

╭────────────────────────────────────╮ │ 📝 Extracted Text Upload (DRY RUN) │ ╰────────────────────────────────────╯

🔍 Scanning 1 JSON files... 📤 Starting upload for 1 records... [DRY RUN] Skipping upload for transcription1 Extracted Text Upload Complete!


## ✅ Acceptance Criteria

* [x] `--dry-run` works across all commands
* [x] No API calls executed in dry run mode
* [x] Local state remains unchanged
* [x] Clear simulation logs displayed
* [x] Extract workflow correctly matches files
* [x] No regression in existing functionality


## 🎯 Result

* Safer bulk operations (pre-validation)
* Improved debugging and developer experience
* Fixed critical upload bug
* More reliable data processing pipeline
Edited by Praveena Veeranki

Merge request reports

Loading