Praveena Veeranki requested to merge feature/dry-run-test into develop Apr 08, 2026

📋 Summary

This MR introduces a --dry-run mode across core upload workflows in the Corpus CLI, enabling users to simulate operations without performing any API mutations.

The feature ensures that users can safely validate file inputs, metadata, and processing logic before executing real uploads—particularly useful for large-scale bulk operations.

❗ Problem

Currently, all upload commands directly perform API operations.

Observed Issues

No way to validate uploads before execution
Users risk:
- Partial uploads
- Failed long-running jobs
- Incorrect metadata submissions
Requires manual recovery using resume after failure
No safe “preview” mode for bulk operations

This is especially problematic for:

Large datasets
Unstable network conditions
First-time users configuring uploads

🎯 Scope of This MR

This MR introduces dry-run support for existing workflows:

upload_files
extract
resume

It focuses only on simulation and validation, without modifying core upload logic.

🚀 Changes Made

1. Added Dry Run Flag

CLI Updates

Introduced --dry-run flag across:
- Upload command
- Extract command
- Resume command

Behavior

When enabled:

Executes:
- File discovery
- Metadata validation
- Processing pipeline logic
Skips:
- API requests (POST, PUT)
- Data mutations
- State updates

2. Upload Flow Updates

Updated:

src/corpus_client_cli/upload.py

Changes

Wrapped API interaction logic with conditional checks:

if not dry_run:
    # perform API call
else:
    print("[DRY RUN] Skipping upload")

Ensures:
- Full validation still runs
- No external side effects occur

Per-file Behavior

Each file is:
- Parsed
- Validated
- Simulated
Output clearly indicates skipped uploads:
```
[DRY RUN] Skipping upload for file1.mp3
```

3. Extracted Text Upload Updates

Updated:

src/corpus_client_cli/extracted_text_upload.py

Changes

Applied dry-run logic to extracted-text workflow:
- CSV parsing
- JSON matching
- Validation
API calls are skipped during simulation

Output Example

[DRY RUN] Skipping upload for transcription1

4. Resume Flow Support

Extended dry-run behavior to resume command
Ensures:
- Previously incomplete uploads can be safely simulated
- No state is modified

5. Logging & UX Improvements

Added clear console indicators:

📝 Upload (DRY RUN)

Consistent log format across all workflows:

[DRY RUN] Skipping upload for <filename>

Provides user clarity that:
- Operation is simulated
- No real upload is happening

6. State Safety

Ensured dry-run does not update local state files:
- No marking files as completed
- No mutation of progress tracking

This allows:

Immediate transition to real execution after dry-run
No unintended side effects

🧪 Testing

Manual Verification

Bulk Upload

corpus-client upload "path/to/files" --dry-run

✔️ Expected:

Files scanned
Validation runs
No uploads performed

Extracted Text Upload

corpus-client extract "mapping.csv" "json_dir" --dry-run

✔️ Expected:

Mapping processed
JSON matched
No API calls

Example Output

╭──────────────────────────────╮
│ 📁 Record Upload (DRY RUN)   │
╰──────────────────────────────╯

🔍 Scanning files...
📤 Starting upload...

[DRY RUN] Skipping upload for file1.mp3
[DRY RUN] Skipping upload for file2.mp3

✅ Dry Run Complete!

🎯 User-Facing Impact

After this MR:

Users can:

Validate uploads before execution
Detect issues early (missing files, bad mappings)
Avoid failed long-running operations
Use CLI in a safe, preview mode

🧠 Why This Approach

This solution:

Reuses existing workflow logic
Adds minimal conditional branching
Avoids duplication of upload logic
Keeps implementation simple and maintainable

Dry-run is implemented as a non-invasive layer, ensuring:

No regression in current behavior
Easy extension for future features

⚠️ Out of Scope

This MR does NOT:

Add retry logic
Modify backend APIs
Change upload algorithms
Introduce new endpoints
Alter data processing logic

📁 Files Changed

src/corpus_client_cli/upload.py
src/corpus_client_cli/extracted_text_upload.py
src/corpus_client_cli/cli.py

🔍 Suggested Reviewer Focus

Correct placement of dry-run condition checks
Ensuring no API calls are triggered accidentally
Log clarity and consistency
State file integrity during simulation

⚠️ Risk Assessment

Risk: Low

Reasons:

No changes to core logic
Only conditional skipping of API calls
No backend impact
Safe fallback behavior

✅ Expected Outcome

Users can run upload commands in dry-run mode and:

See exactly what would happen
Validate inputs and workflows
Proceed confidently with real uploads

🏁 Final Note

This aligns the CLI with standard tooling practices by introducing a safe simulation mode, improving reliability and user experience for bulk data operations.

Edited Apr 08, 2026 by Praveena Veeranki

feat: add --dry-run mode for upload workflows

📋 Summary

❗ Problem

Observed Issues

🎯 Scope of This MR

🚀 Changes Made

1. Added Dry Run Flag

CLI Updates

Behavior

2. Upload Flow Updates

Changes

Per-file Behavior

3. Extracted Text Upload Updates

Changes

Output Example

4. Resume Flow Support

5. Logging & UX Improvements

6. State Safety

🧪 Testing

Manual Verification

Bulk Upload

Extracted Text Upload

Example Output

🎯 User-Facing Impact

🧠 Why This Approach

⚠️ Out of Scope

📁 Files Changed

🔍 Suggested Reviewer Focus

⚠️ Risk Assessment

Reasons:

✅ Expected Outcome

🏁 Final Note

Merge request reports